Applied Linear 
Statistical Models 


Fifth Edition 


Michael H. Kutner 


Emory University 


Christopher J. Nachtsheim 


University of Minnesota 


John Neter 
University of Georgia 


William Li : 


University of Minnesota 


McGraw-Hill 

Irwin 
Boston Burr Ridge, IL Dubuque, ІА Madison, WI New York San Francisco St Louis 
Bangkok Bogotá Caracas Kuala Lumpur Lisbon London Маапа Mexico City 
Milan Montreal New Delhi Santiago Seoul Singapore Sydney Taipei Toronto 


The McGraw-Hill Companies 


McGraw-Hill 
Irwin 


APPLIED LINEAR STATISTICAL MODELS 


Published by McGraw-Hill/Irwin, a business unit of The McGraw-Hill Companies, Inc., 1221 Avenue of the 
Americas, New York, NY, 10020. Copyright (©) 2005, 1996, 1990, 1983, 1974 by The McGraw-Hill Compan 
Inc. All rights reserved. No part of this publication may be reproduced or distributed in any form or by any 
means, or stored in a database or retrieval system, without the prior written consent of The McGraw-Hill 
Companies, Inc., including, but not limited to, in any network or other electronic storage or transmission, or 
broadcast for distance learning. 

Some ancillaries, including electronic and print components, may not be available to customers outside the 
United States. 


This book is printed on acid-free paper. 
1234567890DOC/DOC0987654 
ISBN 0-07-238688-6 


Editorial director: Brent Gordon 

Executive editor: Richard T. Hercher, Jr. 
Editorial assistant: Lee Stone 

Senior marketing manager: Douglas Reiner 
Media producer: Elizabeth Mavetz 

Project manager: Jim Labeots 

Production supervisor: Gina Hangos 

Lead designer: Pam Verros 

Supplement producer: Matthew Perry 

Senior digital content specialist: Brian Nacik 
Cover design: Kiera Pohl 

Typeface: 10/12 Times Roman 

Compositor: Interactive Composition Corporation 
Printer: R. R. Donnelley 


Library of Congress Cataloging-in-Publication Data 


Kutner, Michael Н. 
Applied linear statistical models—Sth ed. / Michael H. Kutner . . . [et al.]. 
p. cm. — (McGraw-Hill/Irwin series Operations and decision sciences) 

Rev. ed. of: Applied linear regression models. 4th ed. c2004. 

Includes bibliographical references and index. 

ISBN 0-07-238688-6 (acid-free paper) 

1. Regression analysis. 2. Mathematical statistics. I. Kutner, Michael Н. Applied linear 

regression models. II. Title. Ш. Series. 
QA278.2.K87 2005 
519.5'36—dc22 2004052447 


www.mhhe.com 


ies, 


‘To 
Nancy, Michelle, Allison, 


‚ Maureen, Abigael, Andrew, Henry G., 


Dorothy, Ron, David, 
Dezhong, Chenghua, Xu 


Preface 


Linear statistical models for regression, analysis of variance, and experimental design are 
widely used today in business administration, economics, engineering, and the social, health, 
and biological sciences. Successful applications of these models require a sound understand- 
ing of both the underlying theory and the practical problems that are encountered in using 
the models in real-life situations. While Applied Linear Statistical Models, Fifth Edition, is 
basically an applied book, it seeks to blend theory and applications effectively, avoiding the 
extremes of presenting theory in isolation and of giving elements of applications without 
the needed understanding of the theoretical foundations. 
The fifth edition differs from the fourth in a number of important respects. 


In the area of regression analysis (Parts І-Ш): 


1. We have reorganized the chapters for better clarity and flow of topics. Material from 
the old Chapter 15 on normal correlation models has been integrated throughout the 
text where appropriate. Much of the material is now found in an expanded Chapter 
2, which focuses on inference in regression analysis. Material from the old Chapter 7 
pertaining to polynomial and interaction regression models and from old Chapter 11 
on quantitative predictors has been integrated into a new Chapter 8 called, “Models 
for Quantitative and Qualitative Predictors.” Material on model validation from old 
Chapter 10 is now fully integrated with updated material on model selection in a new 
Chapter 9 entitled, "Building the Regression Model I: Model Selection and Validation." 

2. We have added material on important techniques for data mining, including regression 
trees and neural network models in Chapters 11 and 13, respectively. 

3. The chapter on logistic regression (Chapter 14) has been extensively revised and 
expanded to include a more thorough treatment of logistic, probit, and complemen- 
tary log-log models, logistic regression residuals, model selection, model assessment, 
logistic regression diagnostics, and goodness of fit tests. We have also developed new 
material on polytomous (multicategory) nominal logistic regression models and poly- 
tomous ordinal logistic regression models. 

4. We have expanded the discussion of model selection methods and criteria. The Akaike 
information criterion and Schwarz Bayesian criterion have been added, and a greater 
emphasis is placed on the use of cross-validation for model selection and validation. 


Tn the areas pertaining to the design and analysis of experimental and observational studies 
(Parts IV-VI): 


5. In the previous edition, Chapters 16 through 25 emphasized the analysis of variance, 
and the design of experiments was not encountered formally until Chapter 26. We 
have completely reorganized Parts IV-VI, emphasizing the design of experimental and 
observational studies from the start. In a new Chapter 15, we provide an overview of 
the basic concepts and planning approaches used in the design of experimental and 
observational studies, drawing in part from material from old Chapters 16, 26, and 
27. Fundamental concepts of experimental design, including the basic types of factors, 


10. 


Preface vii 


treatments, experimental units, randomization, and blocking are described in detail. 
This is followed by an overview of standard experimental designs, as well as the basic 
types of observational studies, including cross-sectional, retrospective, and prospective 
studies. Each of the design topics introduced in Chapter 15 is then covered in greater 
detail in the chapters that follow. We emphasize the importance of good statistical 
design of scientific studies, and make the point that proper design often leads to a 
simple analysis. We note that the statistical analysis techniques used for observational 
and experimental studies are often the same, but the ability to “prove” cause-and-effect 
requires a carefully designed experimental study. 


. Previously, the planning of sample sizes was covered in Chapter 26. We now present 


material on planning of sample sizes in the relevant chapter, rather than devoting a 
single, general discussion to this issue. 


. We have expanded and updated our coverage (Section 24.2) on the interpretation of 


interaction plots for multi-factor studies. 


. We have reorganized and expanded the material on repeated measures designs in Chap- 


ter 27. In particular, we introduce methods for handling the analysis of factor effects 
when interactions between subjects and treatments are important, and when interactions 
between factors are important. 


. We have added material on the design and analysis of balanced incomplete block 


experiments in Section 28.1, including the planning of sample sizes. A new appendix 
(B.15) has been added that provides standard balanced incomplete block designs. 

We have added new material on robust product and process design experiments in 
Chapter 29, and illustrate its use with a case study from the automotive industry. These 
experiments are frequently used in industrial studies to identify product or process 
designs that exhibit Jow levels of variation. 


The remaining changes pertain to both regression analysis (Parts 1-0) and the design and 
analysis of experimental and observational studies (Parts IV-VI): 


11. 


12. 


13. 


We have made extensive revisions to the problem material. Problem data sets are 
generally larger and more challenging, and we have included a large number of new 
case data sets in Appendix C. In addition, we have added a new category of chapter 
exercises, called Case Studies. These are open-ended problems that require students, 
givenan overall objective, to carry out complete analyses of the various case data sets in 
Appendix C. They are distinct from the material in the Problems and Projects sections, 
which frequently ask students to simply carry out specific analytical procedures. 

We have substantially expanded the amount of graphic presentation, including much 
greater use of scatter plot matrices, three-dimensional rotating plots, three-dimensional 
response surface and contour plots, conditional effects plots, and main effects and 
interaction plots. - 

Throughout the text, we have made extensive revisions in the exposition on the basis 
of classroom experience to improve the clarity of the presentation. 


We have included in this book not only the more conventional topics in regression and 


design, but also topics that are frequently slighted, though important in practice. We devote 
three chapters (Chapters 9—11) to the model-building process for regression, including 
computer-assisted selection procedures for identifying good subsets of predictor variables 


x Preface 


The Student Solutions Manual and all of the data files on the compact disk can also be 
downloaded from the book's website at: www.mhhe.com/kutnerALSMs5e. A list of errata 
for the book as well as some useful, related links will also be maintained at this address. 

A book such as this cannot be written without substantial assistance from numerous 
persons. We are indebted to the many contributors who have developed the theory and 
practice discussed in this book. We also would like to acknowledge appreciation to our stu- 
dents, who helped us in a variety of ways to fashion the method of presentation contained 
herein. We are grateful to the many users of Applied Linear Statistical Models and Applied 
Linear Regression Models, who have provided us with comments and suggestions based 
on their teaching with these texts. We are also indebted to Professors James E. Holstein, 
University of Missouri, and David L. Sherry, University of West Florida, for their review of 
Applied Linear Statistical Models, First Edition; to Professors Samuel Kotz, University of 
Maryland at College Park, Ralph P. Russo, University of Iowa, and Peter F. Thall, The George 
Washington University, for their review of Applied Linear Regression Models, First Edition; 
to Professors John S. Y Chiu, University of Washington, James A. Calvin, University of 
Iowa, and Michael F. Driscoll, Arizona State University, for their review of Applied Linear 
Statistical Models, Second Edition; to Professor Richard Anderson-Sprecher, University 
of Wyoming, for his review of Applied Linear Regression Models, Second Edition; and to 
Professors Alexander von Eye, The Pennsylvania State University, Samuel Kotz, University 
of Maryland at College Park, and John B. Willett, Harvard University, for their review of 
Applied Linear Statistical Models, Third Edition; to Professors Jason Abrevaya, Univer- 
sity of Chicago, Frank Alt, University of Maryland, Vitoria Chen, Georgia Tech, Rebecca 
Doerge, Purdue University, Mark Henry, Clemson University, Jim Hobert, University of 
Florida, Ken Koehler, Iowa State University, Chii-Dean Lin, University of Massachussets 
Amherst, Mark Reiser, Arizona State University, Lawrence Ries, University of Missouri 
Columbia, and Ehsan Soofi, University of Wisconsin Milwaukee, for their reviews of 
Applied Linear Regression Models, Third Edition, or Applied Linear Statistical Models, 
Fourth Edition. These reviews provided many important suggestions, for which we are 
most grateful. 

In addition, valuable assistance was provided by Professors Richard K. Burdick, 
Arizona State University, R. Dennis Cook, University of Minnesota, W. J. Conover, Texas 
Tech University, Mark E. Johnson, University of Central Florida, Dick DeVeaux, Williams 
College, and by Drs. Richard I. Beckman, Los Alamos National Laboratory, Ronald L. 
Iman, Sandia National Laboratories, Lexin Li, University of California Davis, and Brad 
Jones, SAS Institute. We are most appreciative of their willing help. We are also indebted 
to the 88 participants in a survey concerning Applied Linear Regression Models, Second 
Edition, the 76 participants in a survey concerning Applied Linear Statistical Models, Third 
Edition, and the 73 participants in a survey concerning Applied Linear Regression Models, 
Third Edition, or Applied Linear Statistical Models, Fourth Edition. Helpful suggestions 
were received in these surveys, for which we are thankful. 

Weiyong Zhang and Vincent Agboto assisted us diligently in the development of new 
problem material, and Lexin Li and Yingwen Dong helped prepare the revised Instructor 
Solutions Manual and Student Solutions Manual under considerable time pressure. Amy 
Hendrickson provided much-needed LaTeX expertise. George Cotsonis assisted us dili- 
gently in preparing computer-generated plots and in checking analysis results. We are most 


Preface xi 


grateful to these persons for their invaluable help and assistance. We also wish to thank 
the various members of the Carlson Executive MBA Program classes of 2003 and 2004; 
notably Mike Ohmes, Trevor Bynum, Baxter Stephenson, Zakir Salyani, Sanders Marvin, 
Trent Spurgeon, Nate Ogzawalla, David Mott, Preston McKenzie, Bruce DeJong, and Tim 
Kensok, for their contributions of interesting and relevant case study data and materials. 

Finally, our families bore patiently the pressures caused by our commitment to complete 
this revision. We are appreciative of their understanding. 


Michael H. Kutner 
Christopher J. Nachtsheim 
John Neter 

William Li 


Contents 


PART ONE 
SIMPLE LINEAR REGRESSION 1 


Chapter 1 
Linear Regression with One Predictor 
Variable 2 


1.1 Relations between Variables 2 
Functional Relation between Two 
Variables 2 
Statistical Relation between Two Variables 3 

1.2 Regression Models and Their Uses 5 
Historical Origins 5 
Basic Concepts 5 
Construction of Regression Models 7 
Uses of Regression Analysis 8 
Regression and Causality 8 
Use of Computers 9 

1.3 Simple Linear Regression Model 

with Distribution of Error Terms 
Unspecified 9 
Formal Statement of Model 9 
Important Features of Model 9 
Meaning of Regression Parameters 11 
Alternative Versions of Regression Model 12 

1.4 Data for Regression Analysis 12 
Observational Data 12 
Experimental Data 13 
Completely Randomized Design 13 

1.5 Overview of Steps in Regression 

Analysis 13 

1.6 Estimation of Regression Function 15 
Method of Least Squares 15 
Point Estimation of Mean Response 21 
Residuals 22 
Properties of Fitted Regression Line 23 

1.7 Estimation of Error Terms Variance o? 24 
Point Estimator of o? 24 

1.8 Normal Error Regression Model 26 
Model 26 
Estimation of Parameters by Method 
of Maximum Likelihood 27 


xii 


Cited References 33 


Problems 33 

Exercises 37 

Projects 38 
Chapter 2 


Inferences in Regression and Correlation 
Analysis 40 


2.1 Inferences Concerning 6, 40 
Sampling Distribution of by. 4I 
Sampling Distribution of (b, — B,)/s{b,} 44 
Confidence Interval for Dy 45 
Tests Concerning B, 47 
2.2 Inferences Concerning Во 48 
Sampling Distribution of bo 48 
Sampling Distribution of (bg — Bo)/s(bo) 49 
Confidence Interval for Po 49 
2.3 Some Considerations on Making Inferences 
Concerning Во and В, 50 
Effects of Departures from Normality 50 
Interpretation of Confidence Coefficient 
and Risks of Errors 50 
Spacing of the X Levels 50 
Power of Tests 50 
2.4 Interval Estimation of E{Y,} 52 
Sampling Distribution of Y, 52 
Sampling Distribution of 
(f, T E(Y.D/s( 1.) 54 
Confidence Interval for E(Y,) 54 
2.5 Prediction of New Observation 55 
Prediction Interval for Yiynew) when 
Parameters Known 56 
Prediction Interval for Ynmew) when 
Parameters Unknown 57 
Prediction of Mean of m New Observations 
for Given X, 60 
2.6 Confidence Band for Regression Line 61 
2.7 Analysis of Variance Approach 
to Regression Analysis 63 
Partitioning of Total Sum of Squares 63 
Breakdown of Degrees of Freedom 66 


Contents xiii 


Mean Squares 66 3.4 Overview of Tests Involving 
Analysis of Variance Table 67 Residuals 114 
Expected Mean Squares 68 Tests for Randomness 114 
F Test of Ву = 0 versus By #0 69 Tests for Constancy of Variance 115 
2.8 GeneralLinear Test Approach 72 Tests for Outliers 115 
Full Model 72 Tests for Normality 115 
Reduced Model 72 3.5 Correlation Test for Normality 115 
Test Statistic 73 3.6 Tests for Constancy of Error 
Summary 73 Variance 116 
2.9 Descriptive Measures of Linear Association Brown-Forsythe Test 116 
between X and Y 74 Breusch-Pagan Test 118 
Coefficient of Determination 74 3.7 Е Test for Lack of Fit 119 
Limitations of R? 75 Assumptions 119 
Coefficient of Correlation 76 Notation 12I 
2.10 Considerations in Applying Regression Full Model 121 
Analysis 77 Reduced Model 123 
2.11 Normal Correlation Models 78 Test Statistic 123 
Distinction between Regression and ANOVA Table 124 
Correlation Model 78 3.8 Overview of Remedial Measures 127 
Bivariate Normal Distribution 78 Nonlinearity of Regression 
Conditional Inferences 80 Function 128 
Inferences on Correlation Coefficients 83 Nonconstancy of Error Variance 128 
Spearman Rank Correlation Coefficient 87 Nonindependence of Error Terms 128 
Cited References 89 Nonnormality of Error Terms 128 
Problems 89 Omission of Important Predictor 
Exercises 97 Variables 129 
Projects 98 Outlying Observations 129 
3.9 ‘Transformations 129 
Chapter 3 Transformations for Nonlinear 
Diagnostics and Remedial Measures 100 Relation Only 129 


Transformations for Nonnormality 
and Unequal Error Variances 132 
Box-Cox Transformations 134 

3.10 Exploration of Shape of Regression 

Function 137 

Lowess Method 138 
Use of Smoothed Curves to Confirm Fitted 
Regression Function 139 

3.11 Case Example—Plutonium 


3.1 Diagnostics for Predictor Variable 100 

3.2 Residuals 102 
Properties of Residuals 102 
Semistudentized Residuals 103 
Departures from Model to Be Studied by 
Residuals 103 

3.3 Diagnostics for Residuals 103 
Nonlinearity of Regression Function 104 
Nonconstancy of Error Variance 107 7 Measurement 141 
Presence of Outliers 108 Cited References 146 
Nonindependence of Error Terms 108 Problems 146 
Nonnormality of Error Terms 110 Exercises 151 
Omission of Important Predictor Projects 152 


взи Case Studies 153 | 
Some Final Comments 114 N 


xiv Contents 


Chapter 4 
Simultaneous Inferences and Other 
Topics in Regression Analysis 154 


4.1 Joint Estimation of fo and В: 154 
Need for Joint Estimation 154 


Bonferroni Joint Confidence Intervals 155 


4.2 Simultaneous Estimation of Mean 
Responses 157 
Working-Hotelling Procedure 158 
Bonferroni Procedure 159 
4.3 Simultaneous Prediction Intervals 
for New Observations 160 
4.4 Regression through Origin 161 
Model 161 
Inferences 161 
Important Cautions for Using Regression 
through Origin 164 
4.5 Effects of Measurement Errors 165 
Measurement Errors in Y 165 
Measurement Errors in X 165 
Berkson Model 167 
4.6 Inverse Predictions 168 
4.7 Choice of X Levels 170 
Cited References 172 
Problems 172 
Exercises 175 
Projects 175 


Chapter 5 
Matrix Approach to Simple 
Linear Regression Analysis 176 


5.1 Matrices 176 

Definition of Matrix 176 

Square Matrix 178 

Vector 178 

Transpose 178 

Equality of Matrices 179 
5.2 Matrix Addition and Subtraction 180 
5.3 Matrix Multiplication 182 


Multiplication of a Matrix by a Scalar 182 
Multiplication of a Matrix by a Matrix 182 


5.4 Special Types of Matrices 185 
Symmetric Matrix 185 
Diagonal Matrix 185 


Vector and Matrix with All Elements 
Unity 187 
Zero Vector 187 
5.5 Linear Dependence and Rank 
of Matrix 188 
Linear Dependence 1&8 
Rank of Matrix 188 
5.6 Inverse of aMatrix 189 
Finding the Inverse 190 
Uses of Inverse Matrix 192 
5.7 Some Basic Results for Matrices 193 
5.8 Random Vectors and Matrices 193 
Expectation of Random Vector or Matrix 
Variance-Covariance Matrix 
of Random Vector 194 
Some Basic Results 196 
Multivariate Normal Distribution 196 
5.9 Simple Linear Regression Model 
in Matrix Terms 197 
5.10 Least Squares Estimation 
of Regression Parameters 199 
Normal Equations 199 
Estimated Regression Coefficients 200 
5.11 Fitted Values and Residuals 202 
Fitted Values 202 
Residuals 203 
5.12 Analysis of Variance Results 204 
Sums of Squares 204 
Sums of Squares as Quadratic 
Forms 205 
5.13 Inferences in Regression Analysis 206 
Regression Coefficients 207 
Mean Response 208 
Prediction of New Observation 209 
Cited Reference 209 
Problems 209 
Exercises 212 


PART TWO 
MULTIPLE LINEAR 
REGRESSION 213 
Chapter 6 


Multiple Regression I 214 
6.1 Multiple Regression Models 214 


6.2 


6.3 
6.4 
6.5 


6.6 


6.7 


6.8 


6.9 


Need for Several Predictor Variables 214 
First-Order Model with Two Predictor 
Variables 215 
First-Order Model with More than Two 
Predictor Variables 217 
General Linear Regression Model 217 
General Linear Regression Model in Matrix 
Terms 222 
Estimation of Regression Coefficients 223 
Fitted Values and Residuals 224 
Analysis of Variance Results 225 
Sums of Squares and Mean Squares 225 
F Test for Regression Relation 226 
Coefficient of Multiple Determination 226 
Coefficient of Multiple Correlation 227 
Inferences about Regression 
Parameters 227 
Interval Estimation of B, 228 
Tests for бк 228 
Joint Inferences 228 
Estimation of Mean Response and 
Prediction of New Observation 229 
Interval Estimation of E(Y,) 229 
Confidence Region for Regression 
Surface 229 
Simultaneous Confidence Intervals for Several 
Mean Responses 230 
Prediction of New Observation Yi.) 230 
Prediction of Mean of m New Observations 
at Xy, 230 
Predictions of g New Observations 231 
Caution about Hidden Extrapolations 231 
Diagnostics and Remedial Measures 232 
Scatter Plot Matrix 232 
Three-Dimensional Scatter Plots 233 
Residual Plots 233 
Correlation Test for Normality 234 
Brown-Forsythe Test for Constancy of Error 
Variance 234 2 
Breusch-Pagan Test for Constancy of Error 
Variance 234 
F Test for Lack of Fit 235 
Remedial Measures 236 
An Example—Multiple Regression with 
Two Predictor Variables 236 
Setting 236 


Contents ху 


Basic Calculations 237 

Estimated Regression Function 240 

Fitted Values and Residuals 241 

Analysis of Appropriateness of Model 241 

Analysis of Variance 243 

Estimation of Regression Parameters 245 

Estimation of Mean Response 245 

Prediction Limits for New Observations 247 
Cited Reference 248 


Problems 248 
Exercises 253 
Projects 254 
Chapter 7 
Multiple Regression П 256 
7.1 Extra Sums of Squares 256 
Basic Ideas 256 
Definitions 259 
Decomposition of SSR into Extra Sums 
of Squares 260 
ANOVA Table Containing Decomposition 
of SSR 261 
7.2 Uses of Extra Sums of Squares in Tests for 
Regression Coefficients 263 
Test whether a Single B, =0 263 
Test whether Several By =O 264 
7.3 Summary of Tests Concerning Regression 
Coefficients 266 
Test whether All By =O 266 
Test whether a Single B, =O 267 
Test whether Some By =O 267 
Other Tests 268 
7.4 Coefficients of Partial Determination 268 
Two Predictor Variables 269 
General Case 269 
Coefficients of Partial Correlation 270 
7.5 Standardized Multiple Regression 


Model 271 
Roundoff Errors in Normal Equations 
Calculations 271 
Lack of Comparability in Regression 
Coefficients 272 iN 
Correlation Transformation 272 
Standardized Regression Model 273 
X'X Matrix for Transformed Variables 274 


xvi Contents 


Estimated Standardized Regression 
Coefficients 275 
7.6 Multcollinearity and Its Effects 278 

Uncorrelated Predictor Variables 279 
Nature of Problem when Predictor Variables 
Are Perfectly Correlated 281 
Effects of Multicollinearity 283 
Need for More Powerful Diagnostics for 
Multicollinearity 289 

Cited Reference 289 

Problems 289 

Exercise 292 

Projects 293 


Chapter 8 
Regression Models for Quantitative 
and Qualitative Predictors 294 


8.1 Polynomial Regression Models 294 
Uses of Polynomial Models 294 
One Predictor Variable—Second Order 295 
One Predictor Variable—Third Order 296 
One Predictor Variable—Higher Orders 296 
Two Predictor Variables—Second Order 297 
Three Predictor Variables—Second 
Order 298 
Implementation of Polynomial Regression 
Models 298 
Case Example 300 
Some Further Comments on Polynomial 
Regression 305 

8.2 Interaction Regression Models 306 
Interaction Effects 306 
Interpretation of Interaction Regression 
Models with Linear Effects 306 
Interpretation of Interaction Regression 
Models with Curvilinear Effects 309 
Implementation of Interaction Regression 
Models 311 

8.3 Qualitative Predictors 313 
Qualitative Predictor with Two 
Classes 314 
Interpretation of Regression Coefficients 315 
Qualitative Predictor with More than Two 
Classes 318 
Time Series Applications 319 


8.4 


8.5 


8.6 


8.7 


Some Considerations in Using Indicator 
Variables 321 

Indicator Variables versus Allocated 

Codes 321 

Indicator Variables versus Quantitative 

Variables 322 

Other Codings for Indicator Variables 323 
Modeling Interactions between Quantitative 
and Qualitative Predictors 324 

Meaning of Regression Coefficients 324 
More Complex Models 327 

More than One Qualitative Predictor 

Variable 328 

Qualitative Predictor Variables Only 329 
Comparison of Two or More Regression 
Functions 329 

Soap Production Lines Example 330 

Instrument Calibration Study Example 334 
Cited Reference 335 
Problems 335 
Exercises 340 
Projects 341 
Case Study 342 


Chapter 9 
Building the Regression Model I: 
Model Selection and Validation 343 


9.1 


9.2 
9.3 


9.4 


Overview of Model-Building Process 343 
Data Collection 343 
Data Preparation 346 
Preliminary Model Investigation 346 
Reduction of Explanatory Variables 347 
Model Refinement and Selection 349 
Model Validation 350 
Surgical Unit Example 350 
Criteria for Model Selection 353 
RẸ or SSE, Criterion 354 
RZ p or MSE, Criterion 355 
Mallows' C, Criterion 357 
AIC, and SBC, Criteria 359 
PRESS, Criterion 360 
Automatic Search Procedures for Model 
Selection 361 
“Best” Subsets Algorithm 361 
Stepwise Regression Methods 364 


Forward Stepwise Regression 364 
Other Stepwise Procedures 367 
9.5 Some Final Comments on Automatic 
Model Selection Procedures 368 
96 Model Validation 369 
Collection of New Data to Check 
Model 370 
Comparison with Theory, Empirical 
Evidence, or Simulation Results 371 
Data Splitting 372 
Cited References 375 
Problems 376 
Exercise 380 
Projects 381 
Case Studies 382 


Chapter 10 
Building the Regression Model II: 
Diagnostics 384 


10.1 Model Adequacy for a Predictor 
Variable—Added-Variable Plots 384 
10.2 Identifying Outlying Y Observations— 
Studentized Deleted Residuals 390 
Outlying Cases 390 
Residuals and Semistudentized 
Residuals 392 
Hat Matrix 392 
Studentized Residuals 394 
Deleted Residuals 395 
Studentized Deleted Residuals 396 
10.3 Identifying Outlying X Observations—Hat 
Matrix Leverage Values 398 
Use of Hat Matrix for Identifying Outlying 
X Observations 398 
Use of Hat Matrix to Identify Hidden 
Extrapolation 400 
10.4 identifying Influential Cases—DFFITS, 
Cook's Distance, and DFBETAS 
Measures 400 = 
Influence on Single Fitted 
Value—DFFITS 401 
Influence on All Fitted Values—Cook’s 
Distance 402 
Influence on the Regression 
Coefficients—DFBETAS 404 


Contents xvii 


Influence on Inferences 405 
Some Final Comments 406 
10.5 Multicollinearity Diagnostics— Variance 
Inflation Factor 406 
Informal Diagnostics 407 
Variance Inflation Factor 408 
10.6 Surgical Unit Example—Continued 410 
Cited References 414 
Problems 414 
Exercises 419 
Projects 419 
Case Studies 420 


Chapter 11 
Building the Regression Model III: 
Remedial Measures 421 


11.1 Unequal Error Variances Remedial 
Measures—Weighted Least Squares 421 
Error Variances Known 422 
Error Variances Known up to 
Proportionality Constant 424 
Error Variances Unknown 424 ' 
11.2 Multicollinearity Remedial 
Measures—Ridge Regression 431 
Some Remedial Measures 431 
Ridge Regression 432 
11.3 Remedial Measures for Influential 
Cases—Robust Regression 437 
Robust Regression 438 
IRLS Robust Regression 439 
11.4 Nonparametric Regression: Lowess 
Method and Regression Trees 449 
Lowess Method 449 
Regression Trees 453 
11.5 Remedial Measures for Evaluating 
Precision in Nonstandard 
Situations—Bootstrapping 458 
General Procedure 459 
Bootstrap Sampling 459 
Bootstrap Confidence Intervals 460 
11.6 Case Example—MNDOT Traffic 
Estimation ^ 464 
The AADT Database 464 N 
Model Development 465 
Weighted Least Squares Estimation 468 


xviii 


Contents 


Cited References 471 
Problems 472 
Exercises 476 
Projects 476 

Case Studies 480 


Chapter 12 
Autocorrelation in Time 
Series Data 481 


12.1 
12.2 


12.3 


12.4 


12.5 


Problems of Autocorrelation 481 
First-Order Autoregressive Error 
Model 484 
Simple Linear Regression 484 
Multiple Regression 484 
Properties of Error Terms 485 
Durbin-Watson Test for 
Autocorrelation 487 
Remedial Measures for 
Autocorrelation 490 
Addition of Predictor Variables 490 
Use of Transformed Variables 490 
Cochrane-Orcutt Procedure 492 
Hildreth-Lu Procedure 495 
First Differences Procedure 496 
Comparison of Three Methods 498 
Forecasting with Autocorrelated Error 
Terms 499 
Cited References 502 
Problems 502 
Exercises 507 
Projects 508 
Case Studies 508 


PART THREE 
NONLINEAR REGRESSION 509 


Chapter 13 


Introduction to Nonlinear Regression 


and Neural Networks 


13.1 


510 


Linear and Nonlinear Regression 

Models 510 
Linear Regression Models 510 
Nonlinear Regression Models 511 
Estimation of Regression Parameters 514 


13.2 


13.3 


13.4 


13.5 
13.6 


Least Squares Estimation in Nonlinear 
Regression 515 
Solution of Normal Equations 517 
Direct Numerical Search—Gauss-Newton 
Method 518 
Other Direct Search Procedures 525 
Model Building and Diagnostics 526 
Inferences about Nonlinear Regression 
Parameters 527 i 
Estimate of Error Term Variance 527 
Large-Sample Theory 528  ! 
When Is Large-Sample Theory 
Applicable? 528 
Interval Estimation of a Single y, 531 
Simultaneous Interval Estimation 
of Several y, 532 
Test Concerning a Single y, 532 
Test Concerning Several y, 533 
Learning Curve Example 533 
Introduction to Neural Network 
Modeling 537 
Neural Network Model 537 
Network Representation 540 
Neural Network as Generalization of Linear 
Regression 541 
Parameter Estimation: Penalized Least 
Squares 542 
Example: Ischemic Heart Disease 543 
Model Interpretation and 
Prediction 546 
Some Final Comments on Neural Network 
Modeling 547 
Cited References 547 
Problems 548 
Exercises 552 
Projects 552 
Case Studies 554 


Chapter 14 
Logistic Regression, Poisson Regression, 
and Generalized Linear Models 555 


14.1 


Regression Models with Binary Response 
Variable 555 
Meaning of Response Function when 
Outcome Variable Is Binary 556 


14.2 


143 


14.4 


14.5 


14.6 


14.7 


14.8 


14.9 


Special Problems when Response Variable 
Is Binary 557 

Sigmoidal Response Functions 

for Binary Responses 559 
Probit Mean Response Function 559 
Logistic Mean Response Function 560 
Complementary Log-Log Response 
Function 562 

Simple Logistic Regression 563 
Simple Logistic Regression Model 563 
Likelihood Function 564 
Maximum Likelihood Estimation 564 
Interpretation of by 567 
Use of Probit and Complementary Log-Log 
Response Functions 568 
Repeat Observations—Binomial 
Outcomes 568 

Multiple Logistic Regression 570 
Multiple Logistic Regression Model 570 
Fitting of Model 571 
Polynomial Logistic Regression 575 

Inferences about Regression 

Parameters 577 
Test Concerning a Single By: Wald 
Test 578 
Interval Estimation of a Single B, 579 
Test whether Several В, = 0: Likelihood 
Ratio Test 580 

Automatic Model Selection 

Methods 582 
Model Selection Criteria 582 
Best Subsets Procedures 583 
Stepwise Model Selection 583 

Tests for Goodness of Fit 586 
Pearson Chi-Square Goodness 
of Fit Test 586 
Deviance Goodness of Fit Test 588 
Hosmer-Lemeshow Goodness 
of Fit Test 589 

Logistic Regression Diagnostics 59T 
Logistic Regression Residuals 591 
Diagnostic Residual Plots 594 
Detection of Influential 
Observations 598 

Inferences about 

Mean Response 602 


14.10 


14.11 


14.12 


14.13 


14.14 


Contents xix 


Point Estimator 602 
Interval Estimation 602 
Simultaneous Confidence Intervals for 
Several Mean Responses 603 
Prediction of a New Observation 604 
Choice of Prediction Rule 604 
Validation of Prediction Error Rate 607 
Polytomous Logistic Regression for 
Nominal Response 608 
Pregnancy Duration Data 
with Polytomous Response 609 
J — 1 Baseline-Category Logits for 
Nominal Response 610 
Maximum Likelihood Estimation 612 
Polytomous Logistic Regression 
for Ordinal Response 614 
Poisson Regression 618 
Poisson Distribution 618 
Poisson Regression Model 619 
Maximum Likelihood Estimation 620 
Model Development 620 
Inferences 621 
Generalized Linear Models 623 
Cited References 624 
Problems 625 
Exercises 634 
Projects 635 
Case Studies 640 


PART FOUR 
DESIGN AND ANALYSIS OF 
SINGLE-FACTOR STUDIES 641 


Chapter 15 
Introduction to the Design of 
Experimental and Observational 


Studies 


15.1 


15.2 


642 


Experimental Studies, Observational 

Studies, and Causation 643 
Experimental Studies 643 
Observational Studies 644 
Mixed Experimental and Observational 
Studies 646 

Experimental Studies: Basic 

Concepts 647 


— 


XX Contents 


15.3 


15.4 


15.5 


15.6 


Factors 647 
Crossed and Nested Factors 648 
Treatments 649 
Choice of Treatments 649 
Experimental Units 652 
Sample Size and Replication 652 
Randomization 653 
Constrained Randomization: 
Blocking 655 
Measurements 658 
An Overview of Standard Experimental 
Designs 658 
Completely Randomized Design 659 
Factorial Experiments 660 
Randomized Complete Block 
Designs 661 
Nested Designs 662 
Repeated Measures Designs 663 
Incomplete Block Designs 664 
Two-Level Factorial and Fractional 
Factorial Experiments 665 
Response Surface Experiments 666 
Design of Observational Studies 666 
Cross-Sectional Studies 666 
Prospective Studies 667 
Retrospective Studies 667 
Matching 668 
Case Study: Paired-Comparison 
Experiment 669 
Concluding Remarks 672 
Cited References 672 
Problems 672 
Exercise 676 


Chapter 16 
Single-Factor Studies 677 


16.1 


16.2 


16.3 


Single-Factor Experimental and 
Observational Studies 677 
Relation between Regression and 
Analysis of Variance 679 

Illustrations 679 

Choice between Two Types of Models 680 
Single-Factor ANOVA Model 681 

Basic ldeas 681 


16.4 


16.5 


16.6 


16.7 


16.8 


16.9 
16.10 


16.11 


Cell Means Model 681 
Important Features of Model 682 
The ANOVA Model Is a Linear 
Model 683 
Interpretation of Factor Level Means 6& 
Distinction between ANOVA Models I 
and ll 685 
Fitting of ANOVA Model 685 
Notation 686 
Least Squares and Maximum Likelihood 
Estimators 687 
Residuals 689 
Analysis of Variance 690 
Partitioning of SSTO 690 
Breakdown of Degrees of Freedom 693 
Mean Squares 693 
Analysis of Variance Table 694 
Expected Mean Squares 694 
F Test for Equality of Factor Level 
Means 698 
Test Statistic 698 
Distribution of F* 699 
Construction of Decision Rule 699 
Alternative Formulation of Model 701 
Factor Effects Model 701 
Definition of u. 702 
Test for Equality of Factor Level 
Means 704 
Regression Approach to Single-Factor 
Analysis of Variance 704 
Factor Effects Model with Unweighted 
Mean 705 
Factor Effects Model with Weighted 
Mean 709 
Cell Means Model 710 
Randomization Tests 712 
Planning of Sample Sizes with Power 
Approach 716 
Power of F Test 716 
Use of Table B.12 for Single-Factor 
Studies 718 
Some Further Observations on Use 
of Table B.12 720 
Planning of Sample Sizes to Find “Best” 
Treatment 721 
Cited Reference 722 


Problems 722 
Exercises 730 
Projects 730 


Case Studies 732 


Chapter 17 
Analysis of Factor Level Means 733 


17.1 
17.2 


17.3 


17.4 


17.5 


17.6 


17.7 


Introduction 733 
Plots of Estimated Factor Level 
Means 735 
Line Plot 735 
Bar Graph and Main Effects Plot 736 
Estimation and Testing of Factor Level 
Means 737 
Inferences for Single Factor Level 
Mean 737 
Inferences for Difference between Two 
Factor Level Means 739 
Inferences for Contrast of Factor Level 
Means 741 
Inferences for Linear Combination of 
Factor Level Means 743 
Need for Simultaneous Inference 
Procedures 744 
Tukey Multiple Comparison 
Procedure 746 
Studentized Range Distribution 746 
Simultaneous Estimation 747 
Simultaneous Testing 747 
Example 1—Equal Sample Sizes 748 
Example 2—Unequal Sample Sizes 750 
Scheffé Multiple Comparison 
Procedure 753 
Simultaneous Estimation 753 
Simultaneous Testing 754 
Comparison of Scheffé and Tukey 
Procedures 755 
Bonferroni Multiple Comparison 
Procedure 756 7 
Simultaneous Estimation 756 
Simultaneous Testing 756 
Comparison of Bonferroni Procedure with 
Scheffé and Tukey Procedures 757 
Analysis of Means 758 


Contents xxi 


17.8 Planning of Sample Sizes with Estimation 
Approach 759 
Example 1—Equal Sample Sizes 759 
Example 2—Unequal Sample Sizes 761 
17.9 Analysis of Factor Effects when Factor 
Is Quantitative 762 
Cited References 766 
Problems 767 
Exercises 773 
Projects 774 
Case Studies 774 


Chapter 18 
ANOVA Diagnostics and Remedial 
Measures 775 


18.1 Residual Analysis 775 
Residuals 776 
Residual Plots 776 
Diagnosis of Departures from ANOVA 
Model 778 
18.2 Tests for Constancy of Error 
Variance 781 
Hartley Test 782 
Brown-Forsythe Test 784 
18.3 Overview of Remedial Measures 786 
18.4 Weighted Least Squares 786 — 
18.5 Transformations of Response 
Variable 789 
Simple Guides to Finding a 
Transformation 789 
Box-Cox Procedure 791 
18.6 Effects of Departures from Model 793 
Nonnormality 793 
Unequal Error Variances 794 
Nonindependence of Error Terms 794 
18.7 Nonparametric Rank F Test 795 
Test Procedure 795 
Multiple Pairwise Testing 
Procedure 797 
18.8 Case Example—Heart Transplant 798 
Cited References 801 


Problems 801 
Exercises 807 
Projects 807 


Case Studies 809 


М See eee 


xxii Contents 


PART FIVE 19.6 


MULTI-FACTOR STUDIES 811 


Chapter 19 
Two-Factor Studies with Equal 
Sample Sizes 812 


19.1 


19.7 
19.8 
Two-Factor Observational and 
Experimental Studies 812 

Examples of Two-Factor Experiments and 

Observational Studies 812 

The One-Factor-at-a-Time (OFAAT) 

Approach to Experimentation 815 

Advantages of Crossed, Multi-Factor 

Designs 816 
19.2 Meaning of ANOVA Model 
Elements 817 

Illustration 817 

Treatment Means 817 

Factor Level Means 818 

Main Effects 818 

Additive Factor Effects 819 

Interacting Factor Effects 822 

Important and Unimportant 

Interactions 824 

Transformable and Nontransformable 

Interactions 826 

Interpretation of Interactions 827 
Model I (Fixed Factor Levels) for 
Two-Factor Studies 829 

Cell Means Model 830 

Factor Effects Model 831 
19.4 Analysis of Variance 833 

Illustration 833 

Notation 834 

Fitting of ANOVA Model 834 

Partitioning of Total Sum 

of Squares 836 

Partitioning of Degrees of Freedom 839 

Mean Squares 839 

Expected Mean Squares 840 

Analysis of Variance Table 840 
Evaluation of Appropriateness of 
ANOVA Model 842 


19.9 


19.3 


19.11 


19.5 


F Tests 843 
Test for Interactions 844 
Test for Factor A Main Effects 844 
Test for Factor B Main Effects 845 
Kimball Inequality 846 
Strategy for Analysis 847 
Analysis of Factor Effects when Factors 
Do Not Interact 848 
Estimation of Factor Level Mean 848 
Estimation of Contrast of Factor Level 
Means 849 
Estimation of Linear Combination of 
Factor Level Means 850 
Multiple Pairwise Comparisons of Factor 
Level Means 850 
Multiple Contrasts of Factor Level 
Means 852 
Estimates Based on Treatment 
Means 853 
Example I—Pairwise Comparisons 
of Factor Level Means 853 
Example 2—Estimation of Treatment 
Means 855 
Analysis of Factor Effects when 
Interactions Are Important 856 
Multiple Pairwise Comparisons 
of Treatment Means 856 
Multiple Contrasts of Treatment 
Means 857 
Example I1 —Pairwise Comparisons 
of Treatment Means 857 
Example 2—Contrasts of Treatment 
Means 860 


19.10 Pooling Sums of Squares in Two-Factor 


Analysis of Variance 861 
Planning of Sample Sizes for Two-Factor 
Studies 862 

Power Approach 862 

Estimation Approach 863 

Finding the "Best" Treatment 864 


Problems 864 
Exercises 876 
Projects 876 


Case Studies 879 


Chapter 20 
Two-Factor Studies—One Case 
per Treatment 880 


20.1  No-Interaction Model 880 
Model 881 
Analysis of Variance 881 
Inference Procedures 881 
Estimation of Treatment Mean 884 
20.2 ‘Tukey Test for Additivity 886 
Development of Test Statistic 886 
Remedial Actions if Interaction Effects 
Are Present 888 
Cited Reference 889 
Problems 889 
Exercises 891 
Case Study 891 


Chapter 21 
Randomized Complete Block 
Designs 892 


21.1 Elements of Randomized Complete Block 


Designs 892 
Description of Designs 892 
Criteria for Blocking 893 
Advantages and Disadvantages 894 
How to Randomize 895 
Illustration 895 
21.2 Моде] for Randomized Complete Block 
Designs 897 
21.3 Analysis of Variance and Tests 898 
Fitting of Randomized Complete 
Block Model 898 
Analysis of Variance 898 
21.4 Evaluation of Appropriateness 
of Randomized Complete Block 
Model 901 
Diagnostic Plots 901 
Tukey Test for Additivity 903 Ы 
21.5 Analysis of Treatment Effects 904 
21.6 | Use of More than One Blocking 
Variable 905 
21.7 Use of More than One Replicate in Each 
Block 906 


Contents xxii 


21.8 Factorial Treatments 908 
21.9 Planning Randomized Complete Block 
Experiments 909 
Power Approach 909 
Estimation Approach 910 
Efficiency of Blocking Variable 911 


Problems 912 
Exercises 916 
Chapter 22 


Analysis of Covariance 917 
22.1 Basic Ideas 917 


How Covariance Analysis Reduces Error 
Variability 917 L 
Concomitant Variables 919 

22.2  Single-Factor Covariance Model 920 
Notation 921 
Development of Covariance Model 921 
Properties of Covariance Model 922 
Generalizations of Covariance 
Model 923 
Regression Formula of Covariance 
Model 924 
Appropriateness of Covariance 
Model 925 
Inferences of Interest 925 

22.3 Example of Single-Factor Covariance 

Analysis 926 

Development of Model 926 
Test for Treatment Effects 928 
Estimation of Treatment Effects 930 
Test for Parallel Slopes 932 

22.4  Two-Factor Covariance Analysis 933 
Covariance Model for Two-Factor 
Studies 933 
Regression Approach 934 25 
Covariance Analysis for Randomized ` 
Complete Block Designs 937 

22.5 Additional Considerations for the Use 

of Covariance Analysis 939 

Covariance Analysis as Alternative 
to Blocking 939 
Use of Differences 939 
Correction for Bias 940 


xxiv Contents 


Interest in Nature of Treatment 
Effects 940 

Problems 941 

Exercise 947 

Projects 947 

Case Studies 950 


Chapter 23 
Two-Factor Studies with Unequal 
Sample Sizes 951 


23.1 Unequal Sample Sizes 951 
Notation 952 
23.2 Use of Regression Approach for Testing 
Factor Effects when Sample Sizes Are 
Unequal 953 
Regression Approach to Two-Factor 
Analysis of Variance 953 
23.3  Inferences about Factor Effects when 
Sample Sizes Are Unequal 959 
Example 1—Pairwise Comparisons 
of Factor Level Means 962 
Example 2—Single-Degree-of-Freedom 
Test 964 
23.4 Empty Cells in Two-Factor Studies 964 
Partial Analysis of Factor Effects 965 
Analysis if Model with No Interactions Can 
Be Employed 966 
Missing Observations in Randomized 
Complete Block Designs 967 
23.5  ANOVA Inferences when Treatment 
Means Are of Unequal Importance 970 
Estimation of Treatment Means and Factor 
Effects 971 
Test for Interactions 972 
Tests for Factor Main Effects by Use 
of Equivalent Regression Models 972 
Tests for Factor Main Effects by Use 
of Matrix Formulation 975 
Tests for Factor Effects when Weights Are 
Proportional to Sample Sizes 977 
23.6 Statistical Computing Packages 980 
Problems 981 
Exercises 988 
Projects 988 
Case Studies 990 


Chapter 24 
Multi-Factor Studies 992 


24.1 ANOVA Model for Three-Factor 
Studies 992 
Notation 992 
Illustration 993 
Main Effects 993 
Two-Factor Interactions 995 
Three-Factor Interactions 996 
Cell Means Model 996 
Factor Effects Model 997 
24.2 Interpretation of Interactions 
in Three-Factor Studies 998 
Learning Time Example 1: Interpretation 
of Three-Factor Interactions 998 
Learning Time Example 2: Interpretation 
of Multiple Two-Factor Interactions 999 
Learning Time Example 3: Interpretation 
of a Single Two-Factor Interaction 1000 
24.3 Fitting of ANOVA Model 1003 
Notation 1003 
Fitting of ANOVA Model 1003 
Evaluation of Appropriateness of ANOVA 
Model 1005 
24.4 Analysis of Variance 1008 
Partitioning of Total Sum of Squares 1008 
Degrees of Freedom and Mean 
Squares 1009 
Tests for Factor Effects 1009 
24.5 Analysis of Factor Effects 1013 
Strategy for Analysis 1013 
Analysis of Factor Effects when Factors Do 
Not Interact 1014 
Analysis of Factor Effects with Multiple 
Two-Factor Interactions or Three-Factor 
Interaction 1016 
Analysis of Factor Effects with Single 
Two-Factor Interaction 1016 
Example—Estimation of Contrasts 
of Treatment Means 1018 
24.6 Unequal Sample Sizes in Multi-Factor 
Studies 1019 
Tests for Factor Effects 1019 
Inferences for Contrasts of Factor Level 
Means 1020 


24.7 


Planning of Sample Sizes 1021 
Power of F Test for Multi-Factor 
Studies 1021 
Use of Table B.12 for Multi-Factor 
Studies 1021 

Cited Reference 1022 

Problems 1022 

Exercises 1027 

Projects 1027 


Case Studies 1028 


Chapter 25 
Random and Mixed Effects Models 1030 


25.1 


Single-Factor Studies—ANOVA 
Model II 1031 
Random Cell Means Model 1031 
Questions of Interest 1034 
Test whether 02 =0 1035 
Estimation of u. 1038 
Estimation of o2/ (c2 + o?) 1040 
Estimation of o? 1041 
Point Estimation of oj, 1042 
Interval Estimation ofc? 1042 
Random Factor Effects Model 1047 
Two-Factor Studies—ANOVA Models II 
апаШ 1047 
ANOVA Model II—Random Factor 
Effects 1047 
ANOVA Model III—Mixed Factor 
Effects 1049 
Two-Factor Studies—ANOVA Tests for 
Models П and Ш 1052 
Expected Mean Squares 1052 
Construction of Test Statistics 1053 
Two-Factor Studies—Estimation 
of Factor Effects for Models II 
and Ш 1055 
Estimation of Variance Components 1055 
Estimation of Fixed Effects in Mixed 
Model 1056 
Randomized Complete Block Design: 
Random Block Effects 1060 
Additive Model 1061 
Interaction Model 1064 


Contents xxv 


25.6 X Three-Factor Studies—ANOVA 
Models П апа Ш 1066 
ANOVA Model II—Random Factor 
Effects 1066 
ANOVA Model III—Mixed Factor 
Effects 1066 
Appropriate Test Statistics 1067 
Estimation of Effects 1069 
25.7 ANOVA Models II and III with Unequal 
Sample Sizes 1070 
Maximum Likelihood Approach 1072 
Cited References 1077 
Problems 1077 " 
Exercises 1085 
Projects 1085 
PART SIX 
SPECIALIZED STUDY 
DESIGNS 1087 
Chapter 26 


Nested Designs, Subsampling, and 
Partially Nested Designs 1088 


26.1 


26.2 


26.4 


26.5 


Distinction between Nested and Crossed 
Factors 1088 
Two-Factor Nested Designs 1091 
Development of Model Elements 1091 
Nested Design Model 1092 
Random Factor Effects 1093 
Analysis of Variance for Two-Factor 
Nested Designs 1093 
Fitting of Model 1093 
Sums of Squares 1094 
Degrees of Freedom 1095 
Tests for Factor Effects 1097 
Random Factor Effects 1099 й 
Evaluation of Appropriateness of Nested ' 
Design Model 1099 
Analysis of Factor Effects in Two-Factor 
Nested Designs 1100 
Estimation of Factor Level Means 
ш. 1100 
Estimation of Treatment Means шу 1102 
Estimation of Overall Mean u.. 1103 
Estimation of Variance Components 1103 


xxvi 


26.6 


26.7 


26.8 


Contents 


Unbalanced Nested Two-Factor 
Designs 1104 
Subsampling in Single-Factor Study with 
Completely Randomized Design 1106 
Model 1107 
Analysis of Variance and Tests of 
Effects 1108 
Estimation of Treatment Effects 1110 
Estimation of Variances 1111 
Pure Subsampling in Three Stages 
Model 1113 
Analysis of Variance 1113 
Estimation of p.. 1113 
Three-Factor Partially Nested 
Designs 1114 
Development of Model 1114 
Analysis of Variance 1115 
Cited Reference 1119 
Problems 1119 
Exercises 1125 
Projects 1125 


1113 


Chapter 27 
Repeated Measures and Related 
Designs 1127 


27.1 


27.2 


27.3 


Elements of Repeated Measures 
Designs 1127 
Description of Designs 1127 
Advantages and Disadvantages 1128 
How to Randomize 1128 
Single-Factor Experiments with Repeated 
Measures on All Treatments 1129 
Model 1129 
Analysis of Variance and Tests 1130 
Evaluation of Appropriateness of Repeated 
Measures Model 1134 
Analysis of Treatment Effects 1137 
Ranked Data 1138 
Multiple Pairwise Testing 
Procedure 1138 
Two-Factor Experiments with Repeated 
Measures on One Factor 1140 
Description of Design 1140 
Model 1141 
Analysis of Variance and Tests 1142 


27.4 


27.5 


27.6 


Evaluation of Appropriateness of Repeated 
Measures Model 1144 
Analysis of Factor Effects: Without 
Interaction 1145 
Analysis of Factor Effects: With 
Interaction 1148 
Blocking of Subjects in Repeated Measures 
Designs 1153 
Two-Factor Experiments with Repeated 
Measures on Both Factors 1153 
Model 1154 
Analysis of Variance and Tests 1155 
Evaluation of Appropriateness of Repeated 
Measures Model 1157 
Analysis of Factor Effects 1157 
Regression Approach to Repeated 
Measures Designs 1161 
Split-Plot Designs 1162 
Cited References 1164 
Problems 1164 
Exercise 1171 
Projects 1171 


Chapter 28 
Balanced Incomplete Block, Latin Square, 
and Related Designs 1173 


28.1 


28.2 


Balanced Incomplete Block 
Designs 1173 
Advantages and Disadvantages 
of BIBDs 1175 
Analysis of Balanced Incomplete Block 
Designs 1177 
ВІВ” Model 1177 
Regression Approach to Analysis of 
Balanced Incomplete Block Designs 1177 
Analysis of Treatment Effects 1180 
Planning of Sample Sizes with Estimation 
Approach 1182 


Latin Square Designs 1183 
Basic Ideas 1183 
Description of Latin Square 


Designs 1184 
Advantages and Disadvantages of Latin 
Square Designs 1185 


Randomization of Latin Square 
Design 1185 
28.4 Latin Square Model 1187 
28.5 Analysis of Latin Square 
Experiments 1188 
Notation 1188 
Fitting of Model 1188 
Analysis of Variance 1188 
Test for Treatment Effects 1190 
Analysis of Treatment Effects 1190 
Residual Analysis 1191 
Factorial Treatments 1192 
Random Blocking Variable Effects 1193 
Missing Observations 1193 
28.6 Planning Latin Square 
Experiments 1193 
Power of F Test 1193 
Necessary Number of Replications 1193 
Efficiency of Blocking Variables 1193 
28.7 Additional Replications with Latin 
Square Designs 1195 
Replications within Cells 1195 
Additional Latin Squares 1196 
28.8  Replications in Repeated Measures 
Studies 1198 
Latin Square Crossover Designs 1198 
Use of Independent Latin Squares 1200 
Carryover Effects 1201 
Cited References 1202 
Problems 1202 


Chapter 29 


Exploratory Experiments: Two-Level 
Factorial and Fractional Factorial 
Designs 1209 


29.1 Two-Level Full Factorial 
Experiments 1210 
Design of Two-Level Studies 1210 
Notation 1210 
Estimation of Factor Effects 1212 
Inferences about Factor Effects 1214 
29.2 Analysis of Unreplicated Two-Level 
Studies 1216 
Pooling of Interactions 1218 
Pareto Plot 1219 


Contents xxvii 


Dot Plot 1220 
Normal Probability Plot 1221 
Center Point Replications 1222 
29.3 Two-Level Fractional Factorial 
Designs 1223 
Confounding 1224 
Defining Relation 1227 
Half-Fraction Designs 1228 
Quarter-Fraction and Smaller-Fraction 
Designs 1229 
Resolution 1231 
Selecting a Fraction of Highest 
Resolution 1232 
29.4 Screening Experiments 1239 
ge Fractional Factorial Desibns 1239 
Plackett-Burman Designs 1240 
29.5 Incomplete Block Designs for Two-Level 
Factorial Experiments 1240 
Assignment of Treatments to Blocks 1241 
Use of Center Point Replications 1243 
29.6 | Robust Product and Process 
Design 1244 
Location and Dispersion Modeling 1246 
Incorporating Noise Factors 1250 
Case Study—Clutch Slave Cylinder 
Experiment 1252 
Cited References 1256 
Problems 1256 
Exercises 1266 


Chapter 30 
Response Surface Methodology 1267 


30.1 Response Surface Experiments 1267 
30.2 Central Composite Response Surface 
Designs 1268 
Structure of Central Composite - 
Designs 1268 
Commonly Used Central Composite 
Designs 1270 
Rotatable Central Composite 
Designs 1271 
Other Criteria for Choosing a Central 
Composite Design 1273 
Blocking Central Composite 
Designs 1275 


VM 


К 


Anh ct ие 


xxviii Contents 


30.3 


30.4 


30.5 


Additional General-Purpose Response 
Surface Designs 1276 
Optimal Response Surface 
Designs 1276 
Purpose of Optimal Designs 1276 
Optimal Design Approach 1278 
Design Criteria for Optimal Design 
Selection 1279 
Construction of Optimal Response Surface 
Designs 1282 
Some Final Cautions 1283 
Analysis of Response Surface 
Experiments 1284 
Model Interpretation and 
Visualization 1284 
Response Surface Optimum 
Conditions 1286 
Sequential Search for Optimum 
Conditions—Method of Steepest 
Ascent 1290 
Cited References 1292 
Problems 1292 
Projects 1295 


Appendix A 
Some Basic Results in Probab 
and Statistics 1297 


Appendix B 
Tables 1315 


Appendix C 
Data Sets 1348 


Appendix D 
Rules for Developing ANOVA 
Tables for Balanced Designs 


Appendix E 
Selected Bibliography 1374 


Index 1385 


ility 


Simple Linear 
| Regression 


Models and 
1358 


Chapter 


Linear Regression with One 
Predictor Variable 


Regression analysis is a statistical methodology that utilizes the relation between two or 
more quantitative variables so that a response or outcome variable can be predicted from 
the other, or others. This methodology is widely used in business, the social and behavioral 
sciences, the biological sciences, and many other disciplines. A few examples of applications 
are: 


1. Sales of a product can be predicted by utilizing the relationship between sales and amount 
of advertising expenditures. 

2. The performance of an employee on a job can be predicted by utilizing the relationship 
between performance and a battery of aptitude tests. 

3. The size of the vocabulary of a child can be predicted by utilizing the relationship 
between size of vocabulary and age of the child and amount of education of the parents. 

4. The length of hospital stay of a surgical patient can be predicted by utilizing the rela- 
tionship between the time in the hospital and the severity of the operation. 


In Part I we take up regression analysis when a single predictor variable is used for 
predicting the response or outcome variable of interest. In Parts II and III, we consider 
regression analysis when two or more variables are used for making predictions. In this 
chapter, we consider the basic ideas of regression analysis and discuss the estimation of the 
parameters of regression models containing a single predictor variable. 


1.1 Relations between Variables 


The concept of a relation between two variables, such as between family income and family 
expenditures for housing, is a familiar one. We distinguish between a functional relation 
and a statistical relation, and consider each of these in turn. 


Functional Relation between Two Variables 


A functional relation between two variables is expressed by a mathematical formula. If X 
denotes the independent variable and Y the dependent variable, a functional relation is 


FIGURE 1.1 
Example of 
Functional 
Relation. 


Example 


Chapter 1 Linear Regression with One Predictor Variable З 


Y 
300 
E 
S 200 
E: 
© 
a 
100 Y= 2Х 
0 50 100 150 X 
Units Sold L 
of the form: 
Y = f(X) 


Given a particular value of X, the function f indicates the corresponding value of Y. 


Consider the relation between dollar sales (Y) of a product sold at a fixed price and number 
of units sold (X). If the selling price is $2 per unit, the relation is expressed by the equation: 


Y 2X 


This functional relation is shown in Figure 1.1. Number of units sold and dollar sales during 
three recent periods (while the unit price remained constant at $2) were as follows: 


Number of Dollar 
Period Units Sold Sales 
1 75 $150 
2 25 50 


3 130 260 


These observations are plotted also in Figure 1.1. Note that all fall directly on the line of 
functional relationship. This is characteristic of all functional relations. 


Statistical Relation between Two Variables 


Example 1 


A statistical relation, unlike a functional relation, is not a perfect one. In general, the 
observations for a statistical relation do not fall directly omthe curve of relationship. 


Performance evaluations for 10 employees were obtained at midyear and at year-end. 
These data are plotted in Figure 1.2a. Year-end evaluations are taken as the dependent or 
response variable Y , and midyear evaluations as the independent, explanatory, ox predictor 


4 PartOne Simple Linear Regression 


FIGURE 1.2 Statistical Relation between Midyear Performance Evaluation and Year-End Evaluation. 


ч со © 
© o o 


Year-End Evaluation 


an 
© 
e 


ра] — 


(a) (b) 


Scatter Plot Scatter Plot and Line of Statistical Relationship 


(J 
Year-End Evaluation 


0 60 


Example 2 


70 80 90 X 0 60 70 80 90 X 
Midyear Evaluation Midyear Evaluation 


variable X. 'The plotting is done as before. For instance, the midyear and year-end perfor- 
mance evaluations for the first employee are plotted at X — 90, Y — 94. 

Figure 1 2aclearly suggests that there is a relation between midyear and year-end evalua- 
tions, in the sense that the higher the midyear evaluation, the higher tends to be the year-end 
evaluation. However, the relation is not a perfect one. There is a scattering of points, sug- 
gesting that some of the variation in year-end evaluations is not accounted for by midyear 
performance assessments. For instance, two employees had midyear evaluations of X = 80, 
yet they received somewhat different year-end evaluations. Because of the scattering of 
points in a statistical relation, Figure 1.2a is called a scatter diagram or scatter plot. In 
statistical terminology, each point in the scatter diagram represents a trial or a case. 

In Figure 1.2b, we have plotted a line of relationship that describes the statistical relation 
between midyear and year-end evaluations. It indicates the general tendency by which year- 
end evaluations vary with the level of midyear performance evaluation. Note that most of 
the points do not fall directly on the line of statistical relationship. This scattering of points 
around the line represents variation in year-end evaluations that is not associated with 
midyear performance evaluation and that is usually considered to be of a random nature. 
Statistical relations can be highly useful, even though they do not have the exactitude of a 
functional relation. 


Figure 1.3 presents data on age and level of a steroid in plasma for 27 healthy females 
between 8 and 25 years old. The data strongly suggest that the statistical relationship is 
curvilinear (not linear). The curve of relationship has also been drawn in Figure 1.3. It 
implies that, as age increases, steroid level increases up to a point and then begins to level 
off. Note again the scattering of points around the curve of statistical relationship, typical 
of all statistical relations. 


Chapter 1 Linear Regression with One Predictor Variable 5 


FIGURE 1.3  Curvilinear Statistical Relation between Age and Steroid Level in. Healthy Females Aged 8 to 25. 
Y 
30 


25 
20 


15 


Steroid Level 


10 


0 10 15 20 25 X 
Age (years) 


1.2 Regression Models and Their Uses 


Historical Origins 

Regression analysis was first developed by Sir Francis Galton in the latter part of the 
19th century. Galton had studied the relation between heights of parents and children and 
noted that the heights of children of both tall and short parents appeared to "revert" or 
"regress" to the mean of the group. He considered this tendency to be a regression to 
“mediocrity.” Galton developed a mathematical description of this regression tendency, the 
precursor of today's regression models. 

The term regression persists to this day to describe statistical relations between variables. 


Basic Concepts 
A regression model is a formal means of expressing the two essential ingredients of a 
statistical relation: 


1. Atendency of the response variable Y to vary with the predictor variable X in asystematic 
fashion. 
2. A scattering of points around the curve of statistical relationship. 


These two characteristics are embodied in a regression model by postulating that: 


1. There is a probability distribution of Y for each level of X. 
2. The means of these probability distributions vary in some systematic fashion with X. 


Example _ Consider again the performance evaluation example in Figure 1.2. The year-end evaluation Y 
———— — —- 1 геаіей іп a regression model as a random variable. For each level of midyear performance 
evaluation, there is postulated a probability distribution of Y. Figure 1.4 shows such a 
probability distribution for X — 90, which is the midyear evaluation for the first employee. 


6 Part Опе Simple Linear Regression 


FIGURE 1.4 
Pictorial 
Representation 
of Regression 
Model. 


Regression 
Curve 


Probability 
Distribution 
of Y 
0 50 70 90 X 

Midyear Evaluation 


The actual year-end evaluation of this employee, Y = 94, is then viewed as a random 
selection from this probability distribution. 

Figure 1.4 also shows probability distributions of Y for midyear evaluation levels X = 50 
and X = 70. Note that the means of the probability distributions have a systematic relation 
to the level of X. This systematic relationship is called the regression function of Y on X. 
The graph of the regression function is called the regression curve. Note that in Figure 1.4 
the regression function is slightly curvilinear. This would imply for our example that the in- 
crease in the expected (mean) year-end evaluation with an increase in midyear performance 
evaluation 15 retarded at higher levels of midyear performance. 

Regression models may differ in the form of the regression function (linear, curvilinear), 
in the shape of the probability distributions of Y (symmetrical, skewed), and in other ways. 
Whatever the variation, the concept of a probability distribution of Y for any given X is the 
formal counterpart to the empirical scatter in a statistical relation. Similarly, the regression 
curve, which describes tbe relation between the means of the probability distributions 
of Y and the level of X, is the counterpart to the general tendency of Y to vary with X 
systematically in a statistical relation. 


Regression Models with More than One Predictor Variable. Regression models may 
contain more than one predictor variable. Three examples follow. 


1. In an efficiency study of 67 branch offices of a consumer finance chain, the response 
variable was direct operating cost for the year just ended. There were four predictor variables: 
average size of loan outstanding during the year, average number of loans outstanding, total 
number of new loan applications processed, and an index of office salaries. 

2. In a tractor purchase study, the response variable was volume (in horsepower) of 
tractor purchases in a sales territory of a farm equipment firm. There were nine predictor 
variables, including average age of tractors on farms in the territory, number of farms in the 
territory, and a quantity index of crop production in the territory. 

3. Ina medical study of short children, the response variable was the peak plasma growth 
hormone level. There were 14 predictor variables, including age, gender, height, weight, 
and 10 skinfold measurements. 


The model features represented in Figure 1.4 must be extended into further dimensions 
when there is more than one predictor variable. With two predictor variables X, and X2, 


Chapter 1 Linear Regression with One Predictor Variable 7 


for instance, a probability distribution of Y for each (Xi, X2) combination is assumed 
by the regression model. The systematic relation between the means of these probability 
distributions and the predictor variables X, and X» is then given by a regression surface. 


Construction of Regression Models 

Selection of Predictor Variables. Since reality must be reduced to manageable propor- 
tions whenever we construct models, only a limited number of explanatory or predictor 
variables can—or should—be included in a regression model for any situation of interest. 
A central problem in many exploratory studies is therefore that of choosing, for a regres- 
sion model, a set of predictor variables that is “good” in some sense for the purposes of 
the analysis. A major consideration in making this choice is the extent to which a chosen 
variable contributes to reducing the remaining variation in Y after allowance is made for 
the contributions of other predictor variables that have tentatively been included in the 
regression model. Other considerations include the importance of the variable ks a causal 
agent in the process under analysis; the degree to which observations on the variable can 
be obtained more accurately, or quickly, or economically than on competing variables; and 
the degree to which the variable can be controlled. In Chapter 9, we will discuss procedures 
and problems in choosing the predictor variables to be included in the regression model. 


Functional Form of Regression Relation. The choice of the functional form of the 
regression relation is tied to the choice of the predictor variables. Sometimes, relevant theory 
may indicate the appropriate functional form. Learning theory, for instance, may indicate 
that the regression function relating unit production cost to the number of previous times the 
item has been produced should have a specified shape with particular asymptotic properties. 

More frequently, however, the functional form of the regression relation is not known in 
advance and must be decided upon empirically once the data have been collected. Linear 
or quadratic regression functions are often used as satisfactory first approximations to 
regression functions of unknown nature. Indeed, these simple types of regression functions 
may be used even when theory provides the relevant functional form, notably when the 
known form is highly complex but can be reasonably approximated by a linear or quadratic 
regression function. Figure 1.5a illustrates a case where the complex regression function 


FIGURE 1.5 Uses of Linear Regression Functions to Approximate Complex Regression 
Functions—Bold Line Is the True Regression Function and Dotted Line Is the Regression 
Approximation. 


(a) Linear Approximation (b) Piecewise Linear Approximation 


8 PartOne Simple Linear Regression 


may be reasonably approximated by a linear regression function. Figure 1.5b provides an 
example where two linear regression functions may be used “piecewise” to approximate a 
complex regression function. 


Scope of Model. In formulating a regression model, we usually need to restrict the cov- 
erage of the model to some interval or region of values of the predictor variable(s). The 
scope is determined either by the design of the investigation or by the range of data at hand. 
For instance, a company studying the effect of price on sales volume investigated six price 
levels, ranging from $4.95 to $6.95. Here, the scope of the model is limited to price levels 
ranging from near $5 to near $7. The shape of the regression function substantially outside 
this range would be in serious doubt because the investigation provided no evidence as to 
the nature of the statistical relation below $4.95 or above $6.95. 


Uses of Regression Analysis 

Regression analysis serves three major purposes: (1) description, (2) control, and (3) predic- 
tion. These purposes are illustrated by the three examples cited earlier. The tractor purchase 
study served a descriptive purpose. In the study of branch office operating costs, the main 
purpose was administrative control; by developing a usable statistical relation between cost 
and the predictor variables, management was able to set cost standards for each branch office 
in the company chain. In the medical study of short children, the purpose was prediction. 
Clinicians were able to use the statistical relation to predict growth hormone deficiencies 
in short children by using simple measurements of the children. 

The several purposes of regression analysis frequently overlap in practice. The branch 
office example is a case in point. Knowledge of the relation between operating cost and 
characteristics of the branch office not only enabled management to set cost standards for 
each office but management could also predict costs, and at the end of the fiscal year it 
could compare the actual branch cost against the expected cost. 


Regression and Causality 


The existence of a statistical relation between the response variable Y and the explanatory or 
predictor variable X does not imply in any way that Y depends causally on X. No matter how 
strong is the statistical relation between X and Y , no cause-and-effect pattern is necessarily 
implied by the regression model. For example, data on size of vocabulary (X) and writing 
speed (Y) fora sample of young children aged 5—10 will show a positive regression relation. 
This relation does not imply, however, that an increase in vocabulary causes a faster writing 
speed. Here, other explanatory variables, such as age of the child and amount of education, 
affect both the vocabulary (X) and the writing speed (Y). Older children have a larger 
vocabulary and a faster writing speed. 

Even when a strong statistical relationship reflects causal conditions, the causal condi- 
tions may act in the opposite direction, from Y to X. Consider, for instance, the calibration 
of a thermometer. Here, readings of the thermometer are taken at different known tempera- 
tures, and the regression relation is studied so that the accuracy of predictions made by using 
the thermometer readings can be assessed. For this purpose, the thermometer reading is the 
predictor variable X, and the actual temperature is the response variable Y to be predicted. 
However, the causal pattern here does not go from X to Y , but in the opposite direction: the 
actual temperature (У) affects the thermometer reading (X). 


Chapter 1 Linear Regression with One Predictor Variable 9 


These examples demonstrate the need for care in drawing conclusions about causal 
relations from regression analysis. Regression analysis by itself provides no information 
about causal patterns and must be supplemented by additional analyses to obtain insights 
about causal relations. 


Use of Computers 

Because regression analysis often entails lengthy and tedious calculations, computers are 
usually utilized to perform the necessary calculations. Almost every statistics package for 
computers contains a regression component. While packages differ in many details, their 
basic regression output tends to be quite similar. 

After an initial explanation of required regression calculations, we shall rely on computer 
calculations for all subsequent examples. We illustrate computer output by presenting output 
and graphics from BMDP (Ref. 1.1), MINITAB (Ref. 1.2), SAS (Ref. 1.3), SPSS (Ref. 1.4), 
SYSTAT (Ref. 1.5), JMP (Ref. 1.6), S-Plus (Ref. 1.7), and MATLAB (Ref. 18), 


1.8 Simple Linear Regression Model with Distribution 
of Error Terms Unspecified 


Formal Statement of Model 
In Part I we consider a basic regression model where there is only one predictor variable 
and the regression function is linear. The model can be stated as follows: 
Y; = Po + В.Х; + е; (1.1) 
where: 


Y; is the value of the response variable in the ith trial 

Во and В| are parameters 

X; is a known constant, namely, the value of the predictor variable in the ith trial 

є, is a random error term with mean E (s;) = 0 and variance o?(e;] = о?; e; and є; are 

uncorrelated so that their covariance is zero (i.e., o (e;, €;} = 0 for all i, j; i A j) 

i —1,...,n 

Regression model (1.1) is said to be simple, linear in the parameters, and linear in the 
predictor variable. It is “simple” in that there is only one predictor variable, “linear in the 
parameters," because no parameter appears as an exponent or is multiplied or divided by 
another parameter, and "linear in the predictor variable," because this variable appears only 


in the first power. A model that is linear in the parameters and in the predictor variable is 
also called a first-order model. 


Important Features of Model 


1. The response Y; in the 7th trial is the sum of two components: (1) the constant term 
Po + В.Х; and (2) the random term z;. Hence, Y; is a random variable. 


2. Since E{e;} = 0, it follows from (A.13c) in Appendix A that: 
E{Y;} = E{Bo + В.Х; + &) = Bo + В.Х; + Efe} = Bo + В.Х; 
Note that Во + В, X; plays the role of the constant a in (A.13c). 


10 PartOne Simple Linear Regression 


Example 


Thus, the response Y;, when the level of X in the ith trial is X;, comes from a probability 
distribution whose mean is: 


E(Yi] = Bo + В.Х; (1.2) 
We therefore know that the regression function for model (1.1) is: 
E{Y} = Bo + В.Х (1.3) 


since the regression function relates the means of the probability distributions of Y for given 
X to the level of X. 


3. The response Y; in the ith trial exceeds or falls short of the value of the regression 
function by the error term amount e;. 


4. The error terms e; are assumed to have constant variance o°. It therefore follows that 
the responses Y; have the same constant variance: 
оу} = о? (1.4) 
since, using (A.16a), we have: 
ов + fX; + &) = 07 {e;} = о? 
Thus, regression model (1.1) assumes that the probability distributions of Y have the same 


variance о?, regardless of the level of the predictor variable X. 


5. The error terms are assumed to be uncorrelated. Since the error terms e; and £; are 
uncorrelated, so are the responses Y; and Y;. 


6. In summary, regression model (1.1) implies that the responses Y; come from proba- 
bility distributions whose means are E{Y;} = Во + В, X; and whose variances are o?, the 
same for all levels of X. Further, any two responses Y; and Y; are uncorrelated. 


A consultant for an electrical distributor is studying the relationship between the number 
of bids requested by construction contractors for basic lighting equipment during a week 
and the time required to prepare the bids. Suppose that regression model (1.1) is applicable 
and is as follows: 

Y, = 9.5 + 2.1X; + €i 


where X is the number of bids prepared in a week and Y is the number of hours required to 
prepare the bids. Figure 1.6 contains a presentation of the regression function: 


E{Y} =9.5+21Х 


Suppose that in the ith week, X; — 45 bids are prepared and the actual number of hours 
required is Y; = 108. In that case, the error term value is e; = 4, for we have 


E(Y;) = 9.5 + 2.1(45) = 104 
and 
Y; = 108 = 10444 


Figure 1.6 displays the probability distribution of Y when X = 45 and indicates from 
where in this distribution the observation Y; = 108 came. Note again that the error term e; 
is simply the deviation of Y; from its mean value E{Y;}. 


FIGURE 1.6 
Illustration of 
Simple Linear 
Regression 
Model (1.1). 


FIGURE 1.7 
Meaning of 
Parameters of 
Simple Linear 
Regression 
Model (1.1). 


Chapter 1 


Linear Regression with One Predictor Variable 11 


EY) = 104 


0 25 45 X 
Number of Bids Prepared 


EY) = 9.5 + 2.1X 


50 


Hours 


0 10 20 30 40 X 
Number of Bids Prepared 


Figure 1.6 also shows the probability distribution of Y when X = 25. Note that this 
distribution exhibits the same variability as the probability distribution when X = 45, in 
conformance with the requirements of regression model (1.1). 


Meaning of Regression Parameters 


Example 


The parameters Во and f, in regression model (1.1) are called regression coefficients. f, 
is the slope of the regression line. It indicates the change in the mean of the probability 
distribution of Y per unit increase in X. The parameter Во is the Y intercept of the regression 
line. When the scope of the model includes X = 0, Во gives the mean of the probability 
distribution of Y at X = 0. When the scope of the model does not cover X = 0, Bo does 
not have any particular meaning as a separate term in the regression model. 


Figure 1.7 shows the regression function: 
Е{Ү}= 9.5 +2.1Х 


for the electrical distributor example. The slope В = 2.1 indicates that the preparation of 
one additional bid in a week leads to an increase in the mean of the probability distribution 
of Y of 2.1 hours. 

The intercept Во = 9.5 indicates the value of the regression function at X = 0. However, 
since the linear regression model was formulated to apply to weeks where the number of 


12 PartOne Simple Linear Regression 


bids prepared ranges from 20 to 80, Во does not have any intrinsic meaning of its own 
here. If the scope of the model were to be extended to X levels near zero, a model with 
a curvilinear regression function and some value of Во different from that for the linear 
regression function might well be required. 


Alternative Versions of Regression Model 


Sometimes it is convenient to write the simple linear regression model (1.1) in somewhat 
different, though equivalent, forms. Let Xo be a constant identically equal to 1. Then, we 
can write (1.1) as follows: 


Y; = BoXo + В.Х: + е where Xp = 1 (1.5) 


This version of the model associates an X variable with each regression coefficient. 
An alternative modification is to use for the predictor variable the deviation X; — X 
rather than X;. To leave model (1.1) unchanged, we need to write: 


Y; = Bo + Bi(X; — X) + В.Х + & 
= (Bo + В.Х) + (X; — X) + & 
= f$ + В.(Х; — X) ё; 
Thus, this alternative model version is: 
Y; = бу + (X; — X) + е (1.6) 
where: 
Pò = Bo + В.Х (1.6a) 


We use models (1.1), (1.5), and (1.6) interchangeably as convenience dictates. 


1.4 Data for Regression Analysis 


Ordinarily, we do not know the values of the regression parameters Во and 6, in regression 
model (1.1), and we need to estimate them from relevant data. Indeed, as we noted earlier, we 
frequently do not have adequate a priori knowledge of the appropriate predictor variables 
and of the functional form of the regression relation (e.g., linear or curvilinear), and we 
need to rely on an analysis of the data for developing a suitable regression model. 

Data for regression analysis may be obtained from nonexperimental or experimental 
studies. We consider each of these in turn. 


Observational Data 


Observational data are data obtained from nonexperimental studies. Such studies do not 
control the explanatory or predictor variable(s) of interest. For example, company officials 
wished to study the relation between age of employee (X) and number of days of illness 
last year (Y). The needed data for use in the regression analysis were obtained from per- 
sonnel records. Such data are observational data since the explanatory variable, age, is not 
controlled. 

Regression analyses are frequently based on observational data, since often it is not 
feasible to conduct controlled experimentation. In the company personnel example just 
mentioned, for instance, it would not be possible to control age by assigning ages to persons. 


Chapter 1 Linear Regression with One Predictor Variable 1З 


A major limitation of observational data is that they often do not provide adequate infor- 
mation about cause-and-effect relationships. For example, a positive relation between age of 
employee and number of days of illness in the company personnel example may not imply 
that number of days of illness is the direct result of age. It might be that younger employees 
of the company primarily work indoors while older employees usually work outdoors, and 
that work location is more directly responsible for the number of days of illness than age. 

Whenever a regression analysis 1s undertaken for purposes of description based on ob- 
servational data, one should investigate whether explanatory variables other than those con- 
sidered in the regression model might more directly explain cause-and-effect relationships. 


Experimental Data 


Frequently, it is possible to conduct a controlled experiment to provide data from which the 
regression parameters can be estimated. Consider, for instance, an insurance company that 
wishes to study the relation between productivity of its analysts in processing glaims and 
length of training. Nine analysts are to be used in the study. Three of them will be selected 
at random and trained for two weeks, three for three weeks, and three for five weeks. 
The productivity of the analysts during the next 10 weeks will then be observed. The data 
so obtained will be experimental data because control is exercised over the explanatory 
variable, length of training. 

When control over the explanatory variable(s) is exercised through random assignments, 
as in the productivity study example, the resulting experimental data provide much stronger 
information about cause-and-effect relationships than do observational data. The reason is 
that randomization tends to balance out the effects of any other variables that might affect 
the response variable, such as the effect of aptitude of the employee on productivity. 

Inthe terminology of experimental design, the length of training assigned to an analyst in 
the productivity study example is called a treatment. The analysts to be included in the study 
are called the experimental units. Control over the explanatory variable(s) then consists of 
assigning a treatment to each of the experimental units by means of randomization. 


Completely Randomized Design 
The most basic type of statistical design for making randomized assignments of treatments to 
experimental units (or vice versa) is the completely randomized design. With this design, the 
assignments are made completely at random. This complete randomization provides that all 
combinations of experimental units assigned to the different treatments are equally likely, 
which implies that every experimental unit has an equal chance to receive any one of the 
treatments. 

A completely randomized design is particularly useful when the experimental units are 
quite homogeneous. This design is very flexible; itaccommodates any number of treatments 
and permits different sample sizes for different treatments. Its chief disadvantage is that, 
when the experimental units are heterogeneous, this design is not as efficient as some other 
statistical designs. 


- 


1.5 Overview of Steps in Regression Analysis 


The regression models considered in this and subsequent chapters can be utilized either 
for observational data or for experimental data from a completely randomized design. 
(Regression analysis can also utilize data from other types of experimental designs, but 


14 PartOne Simple Linear Regression 


the regression models presented here will need to be modified.) Whether the data are 
observational or experimental, it is essential that the conditions of the regression model be 
appropriate for the data at hand for the model to be applicable. 

We begin our discussion of regression analysis by considering inferences about the re- 
gression parameters for the simple linear regression model (1.1). For the rare occasion 
where prior knowledge or theory alone enables us to determine the appropriate regression 
model, inferences based on the regression model are the first step in the regression analysis. 
In the usual situation, however, where we do not have adequate knowledge to specify the 
appropriate regression model in advance, the first step is an exploratory study of the data, 
as shown in the flowchart in Figure 1.8. On the basis of this initial exploratory analysis, 
one or more preliminary regression models are developed. These regression models are 
then examined for their appropriateness for the data at hand and revised, or new models 


FIGURE 1.8 
Typical 
Strategy for 
Regression 
Analysis. 


Exploratory | 
data analysis { 


Develop one 
or more tentative | 
regression models | 


one or more of 
the regression models 
suitable for the data 


E 


athand? f 


Identify most | 
suitable model | 


Make inferences 
on basis of 
regression model 


Revise regression 
models and/or 


Chapter 1 Linear Regression with Опе Predictor Variable 15 


are developed, until the investigator is satisfied with the suitability of a particular regres- 
sion model. Only then are inferences made on the basis of this regression model, such as 
inferences about the regression parameters of the model or predictions of new observations. 

We begin, for pedagogic reasons, with inferences based on the regression model that is 
finally considered to be appropriate. One must have an understanding of regression models 
and how they can be utilized before the issues involved in the development of an appropriate 
regression model can be fully explained. 


1.6 Estimation of Regression Function 


Example 


The observational or experimental data to be used for estimating the parameters of the 
regression function consist of observations on the explanatory or predictor variable X and 
the corresponding observations on the response variable Y. For each trial, there is an X 
observation and a Y observation. We denote the (X, Y) observations for the first trial as 
(X1, Yı), for the second trial as (X5, Y2), and in general for the ith trial as (X;, Y;), where 
Ll 


In a small-scale study of persistence, an experimenter gave three subjects a very difficult 
task. Data on the age of the subject (X) and on the number of attempts to accomplish the 
task before giving up (Y) follow: 


Subject i: 1 2 3 
Age X;: 20 55 30 
Number of attempts Y;: 5 12 10 


In terms of the notation to be employed, there were n = 3 subjects in this study, the 
observations for the first subject were (Xi, Yi) = (20, 5), and similarly for the other 
subjects. 


Method of Least Squares 


To find “good” estimators of the regression parameters Во and £j, we employ the method 
of least squares. For the observations (X;, Y;) for each case, the method of least squares 
considers the deviation of Y; from its expected value: 


Y; — (Bo + Bi Xi) (1.7) 


In particular, the method of least squares requires that we consider the sum of the n squared 
deviations. This criterion is denoted by О: 


о= Уи — po- ВХ) (1.8) 


i-l 


According to the method of least squares, the estimators of fo and В. are those values 
bo and by, respectively, that minimize the criterion Q for the given sample observations 
(Xi, Y), (X5, Р), 3 (Х„, Y,). 


16 Part Опе Simple Linear Regression 


FIGURE 1.9 Illustration of Least Squares Criterion Q for Fit of a Regression Lme—Persistence Study 


Example. 


Attempts 


Example 


Y = 9.0 + 0(X) Y= 2.81 + .177X 
| 


= 2.81 + .177X 


Attempts 
o 


Figure 1.9a presents the scatter plot of the data for the persistence study example and the 
regression line that results when we use the mean of the responses (9.0) as the predictor 
and ignore X: 


f$ 29.0--0(X) 


Note that this regression line uses estimates bọ = 9.0 and b, = 0, and that f denotes 
the ordinate of the estimated regression line. Clearly, this regression line is not a good 
fit, as evidenced by the large vertical deviations of two of the Y observations from the 
corresponding ordinates Ў of the regression line. The deviation for the first subject, for 
which (X,, Yi) = (20, 5), is: 


Y, — (bo + bi X)) = 5 – [9.0 + 0(20)] = 5 – 9.0 = —4 
The sum of the squared deviations for the three cases is: 
О = (5 — 9.0)? + (12 — 9.0? + (10 — 9.0)? = 26.0 
Figure 1.9b shows the same data with the regression line: 
Y = 2.81 + .177X 


The fit of this regression line is clearly much better. The vertical deviation for the first case 
now is: 


Yi — (bo + b X) = 5 — [2.81 + .177(20)] = 5 — 6.35 = —1.35 
and the criterion Q is much reduced: 
О = (5 — 6.35} + (12 — 12.55)? + (10 — 8.12)? = 5.7 


Thus, a better fit of the regression line to the data corresponds to a smaller sum Q. 
The objective of the method of least squares is to find estimates Ро and b, for fo and f, 
respectively, for which Q is a minimum. In a certain sense, to be discussed shortly, these 


Chapter 1 Linear Regression with One Predictor Variable 17 


estimates will provide a “good” fit of the linear regression function. The regression line in 
Figure 1.9b is, in fact, the least squares regression line. 


Least Squares Estimators. Theestimators bo and b, that satisfy the least squares criterion 
can be found in two basic ways: 


1. Numerical search procedures can be used that evaluate in a systematic fashion the least 
squares criterion Q for different estimates bọ and b, until the ones that minimize О are 
found. This approach was illustrated in Figure 1.9 for the persistence study example. 

2. Analytical procedures can often be used to find the values of bp and b, that minimize 
Q. The analytical approach is feasible when the regression model is not mathematically 
complex. 


Using the analytical approach, it can be shown for regression model (1.1) that the values 
bo and b, that minimize О for any particular set of sample data are given by the е following 
simultaneous equations: 


XOY; =nb+b у X (1.9а) 
Уху = SY) Xi +b у Х] (1.9b) 


Equations (1.9a) and (1.9b) are called normal equations; bo and b, are called point esti- 
mators of Во and Ві, respectively. 
The normal equations (1.9) can be solved simultaneously for bp and by: 
XQ; — X)(m – Y) 
У(Х, X? 


= (у1һ-ьў`х)=?-иК (1.10b) 


where X and Y are the means of the X; and the Y; observations, respectively. Computer 
calculations generally are based on many digits to obtain accurate values for bo and by. 


b= (1.10a) 


Comment 

The normal equations (1.9) can be derived by calculus. For given sample observations (X;, Y;), the 
quantity Q in (1.8) is a function of Bp and f. The values of Во and £, that minimize О can be derived 
by differentiating (1.8) with respect to Во and f. We obtain: 


8Q 
m p — Во - AX) 
20 
Е. Xi(Y; — Xi 
зд = 22, di - Bo – Вх) 
We then set these partial derivatives equal to zero, using bo and b; to denote the particular values of 
Во and Ву that minimize О: 2 2 


-2X Œ; -bo — Xi) = 0 


-23 Xii — bo — biXj) = 0 


18 PartOne Simple Linear Regression 


Simplifying, we obtain: 


XOT = bo — Xi) =0 


i=l 
XOX: — bo — b X) = 0 
i=! 


Expanding, we have: 


XO —nbo — by у X: =0 
Уу XY -bY Xi —b у X = 0 


from which the normal equations (1.9) are obtained by rearranging terms. 
A test of the second partial derivatives will show that a minimum is obtained with the least squares 
estimators Ро and bı. ш 


Properties of Least Squares Estimators. An important theorem, called the Gauss- 
Markov theorem, states: 


Under the conditions of regression model (1.1), the least squares 
estimators bọ and b; in (1.10) are unbiased and have minimum (1.11) 
variance among all unbiased linear estimators. 


This theorem, provenin the next chapter, states first that bp and b, are unbiased estimators. 
Hence: 


E(b)—fo | E{bi} = Bi 


so that neither estimator tends to overestimate or underestimate systematically. 
Second, the theorem states that the estimators bọ and b, are more precise (i.e., their 
sampling distributions are less variable) than any other estimators belonging to the class of 


unbiased estimators that are linear functions of the observations Y;, ..., Y,,. The estimators 
bo and b, are such linear functions of the Y;. Consider, for instance, bı. We have from (1.10a): 
, _ УЖ - MH – Р) 
| DA: - Xy 


It will be shown in Chapter 2 that this expression 1s equal to: 


p 22080 DE зн 


УХХ, - Xy 
where: 
"E T. 
'— EA- Xy 


Since the Ё; are known constants (because the X; are known constants), 5, is a linear 
combination of the Y; and hence is a linear estimator. 


Dec 
Example 
Exam oa 


Chapter 1 Linear Regression with One Predictor Variable 19 


In the same fashion, it can be shown that bo is a linear estimator. Among all linear 
estimators that are unbiased then, Ро and b, have the smallest variability in repeated samples 
in which the X levels remain unchanged. 


The Toluca Company manufactures refrigeration equipment as well as many replacement 
parts. In the past, one of the replacement parts has been produced periodically in lots of 
varying sizes. When acostimprovement program was undertaken, company officials wished 
to determine the optimum lot size for producing this part. The production of this part involves 
setting up the production process (which must be done no matter what is the lot size) and 
machining and assembly operations. One key input for the model to ascertain the optimum 
lot size was the relationship between lot size and labor hours required to produce the lot. 
To determine this relationship, data on lot size and work hours for 25 recent production 
runs were utilized. The production conditions were stable during the six-month period in 
which the 25 runs were made and were expected to continue to be the same during the 
next three years, the planning period for which the cost improvement program*was being 
conducted. 

Table 1.1 contains a portion of the data on lot size and work hours in columns 1 and 
2. Note that all lot sizes are multiples of 10, a result of company policy to facilitate the 
administration of the parts production. Figure 1.10a shows a SYSTAT scatter plot of the 
data. We see that the lot sizes ranged from 20 to 120 units and that none of the production 
runs was outlying in the sense of being either unusually small or large. The scatter plot also 
indicates that the relationship between lot size and work hours is reasonably linear. We also 
see that no observations on work hours are unusually small or large, with reference to the 
relationship between lot size and work hours. 

To calculate the least squares estimates Ро and b, in (1.10), we require the deviations 
X; — X and Y; — Y. These are given in columns 3 and 4 of Table 1.1. We also require 
the cross-product terms (X; — X)(Y; — Y) and the squared deviations (X; — X)?; these 
are shown in columns 5 and 6. The squared deviations (Y; — Y)? in column 7 are for 
later use. 


TABLE 1.1 Data on Lot Size and Work Hours and Needed Calculations for Least Squares Estimates—Toluca 


Company Example. 
0) (2) (3) (4) (5) (6) 2) 
Lot Work 
Run Size Hours 
i Xi Y X-X W-Y  (X4—X(«-Y) (X-K (0 Ӱ)? 
1 80 399 10 .86.72 867.2 100 7,520.4 
2 30 121 —40 —191.28 7,651.2 1,600 36,588.0 
3 50 221 —20 - —91.28 1,825.6 400 8,332.0 
23 40 244 —30 —68.28 2,048.4 900 4,662.2 
24 80 342 10 29.72 2972. 100 883.3 
25 70 323 0 10.72 0.0 0 114.9 
- Total 1,750 7,807 0 0 70,690 19,800 307,203 
Meari 70.0 312.28 


20 PartOne Simple Linear Regression 


FIGURE 1.10 
SYSTAT 
Scatter Plot 
and Fitted 
Regression 
Line—Toluca 
Company 
Example. 


FIGURE 1.11 
Portion of 
MINITAB 
Regression 
Output— 
Toluca 
Company 
Example. 


(a) Scatter Plot (b) Fitted Regression Line 
600 


Lot Size Lot Size 


The regression equation is 
Y = 62.4 + 3.57 X 


Predictor Coef Stdev t-ratio р 
Constant 62.37 26.18 2.38 0.026 
X 3.5702 0.3470 10.29 0.000 
в = 48.82 R-sq = 82.2% R-sq(adj) = 81.4% 


We see from Table 1.1 that the basic quantities needed to calculate the least squares 
estimates are as follows: 


Da — Х)(Ү; — Y) = 70,690 
У(Х — X)? = 19,800 

X = 70.0 
Ӯ = 312.28 


Using (1.10) we obtain: 

EA: -X)Q0; -Y) _ 70,690 
УМХ; = Х)2 19,800 

bo = Y — bi X = 312.28 — 3.5702(70.0) = 62.37 


b = = 3.5702 


Thus, we estimate that the mean number of work hours increases by 3.57 hours for each 
additional unit produced in the lot. This estimate applies to the range of lot sizes in the 
data from which the estimates were derived, namely to lot sizes ranging from about 20 to 
about 120. 

Figure 1.11 contains a portion of the MINITAB regression output for the Toluca Company 
example. The estimates bp and b, are shown in the column labeled Coef, corresponding to 


Chapter 1 Linear Regression with One Predictor Variable 21 


the lines Constant and X, respectively. The additional information shown in Figure 1.11 
will be explained later. 


Point Estimation of Mean Response 


Example 


Estimated Regression Function. Given sample estimators bp and b, of the parameters 
in the regression function (1.3): 


E{Y} = Bo + В.Х 
we estimate the regression function as follows: 
Ў = bo +b,X (1.12) 


where Ў (read Y hat) is the value of the estimated regression function at the level X of the 
predictor variable. 

We call a value of the response variable a response and E{Y} the mean response. Thus, 
the mean response stands for the mean of the probability distribution of Y corresponding 
to the level X of the predictor variable. Ў then is a point estimator of the mean response 
when the level of the predictor variable is X. It can be shown as an extension of the Gauss- 
Markov theorem (1.11) that Ў is an unbiased estimator of E{Y}, with minimum variance 
in the class of unbiased linear estimators. 

For the cases in the study, we will call Ў;: 


¥;=bo+bX i=1,...,n (1.13) 


the fitted value for the ith case. Thus, the fitted value Ў, is to be viewed in distinction to the 
observed value Y;. 


For the Toluca Company example, we found that the least squares estimates of the regression 
coefficients are: 


bo = 62.37 b, — 3.5702 
Hence, the estimated regression function is: 
Y = 62.37 + 3.5702X 


This estimated regression function is plotted in Figure 1.10b. It appears to be a good 
description of the statistical relationship between lot size and work hours. 

To estimate the mean response for any level X of the predictor variable, we simply 
substitute that value of X in the estimated regression function. Suppose that we are interested 
inthe mean number of work hours required when the lot size is X = 65; our point estimate is: 

i 


Ê = 62.37 + 3.5702(65) = 294.4 


Thus, we estimate that the mean number of work hours required for production runs of 
X = 65 units is 294.4 hours. We interpret this to mean that if many lots of 65 units are 
produced under the conditions of the 25 runs on which the estimated regression function 1s 
based, the mean labor time for these lots is about 294 hours. Of course, the labor time for 
any one lot of size 65 15 likely to fail above or below the mean response because of inherent 
variability in the production system, as represented by the error term in the model. 


22 PartOne Simple Linear Regression 


TABLE 1.2 
Fitted Values, 
Residuals, and 
Squared 
Residuals— 
Toluca 
Company 
Example. 


Residuals 


(1) (2) (3) (4) (5) 
Estimated 
Lot Work Mean | Squared 
Run Size Hours Response Residual Residual 
i Xi Y; f, Y;—Yi;—-e (0 — Ӯ;)2 =e 
1 80 399 347.98 51.02 2,603.0 
2 30 121 169.47 —48.47 2,349.3 
3 50 221 240.88 —19.88 395.2 
23 40 244 205.17 38.83 1,507.8 
24 80 342 347.98 —5.98 35.8 
25 70 323 312.28 10.72 114.9 
Total 1,750 7,807 7,807 0 54,825 


Fitted values for the sample cases are obtained by substituting the appropriate X values 
into the estimated regression function. For the first sample case, we have X, — 80. Hence, 
the fitted value for the first case is: 


Ê, = 62.37 + 3.5702(80) = 347.98 


This compares with the observed work hours of Y, = 399. Table 1.2 contains the observed 
and fitted values for a portion of the Toluca Company data in columns 2 and 3, respectively. 


Alternative Model (1.6). | When the alternative regression model (1.6): 
Y, = Bo + (X; — X) + е 


is to be utilized, the least squares estimator b, of f, remains the same as before. The least 
squares estimator of Bj = Во + В.Х becomes, from (1.10b): 


bj = bo +b, X =(¥ — bX) +b, = ľ (1.14) 
Hence, the estimated regression function for alternative model (1.6) is: 
¥=¥ +b (X-X) (1.15) 


In the Toluca Company example, Ў = 312.28 and X = 70.0 (Table 1.1). Hence, the 
estimated regression function in alternative form is: 


Y = 312.28 + 3.5702(X — 70.0) 
For the first lot in our example, X, — 80; hence, we estimate the mean response to be: 
Ў, = 312.28 + 3.5702(80 — 70.0) = 347.98 


which, of course, is identical to our earlier result. 


The ith residual is the difference between the observed value Y; and the corresponding fitted 
value Y;. This residual is denoted by e; and is defined in general as follows: 


e; = Y; - 1, (1.16) 


FIGURE 1.12 
Illustration of 
Residuals— 
Toluca 
Company 
Example (not 
drawn to 
scale). 


Chapter 1 Linear Regression with One Predictor Variable 23 


Y Y, = 399 


Hours 


Lot Size 


For regression model (1.1), the residual e; becomes: 
e; = Y; — (bo + b Xi) = Y; — bo — bi X; (1.16a) 


The calculation of the residuals for the Toluca Company example is shown for a portion 
of the data in Table 1.2. We see that the residual for the first case is: 


e, = Y, — Ў, = 399 — 347.98 = 51.02 


The residuals for the first two cases are illustrated graphically in Figure 1.12. Note in 
this figure that the magnitude of a residual 1s represented by the vertical deviation of the Y; 
observation from the corresponding point on the estimated regression function (1.е., from 
the corresponding fitted value f';). 

We need to distinguish between the model error term value ¢; = Y; — E{Y;} and the 
residual e; = Y; — Y;. The former involves the vertical deviation of Y; from the unknown 
true regression line and hence is unknown. On the other hand, the residual is the vertical 
deviation of Y; from the fitted value Ў; on the estimated regression line, and it is known. 

Residuals are highly useful for studying whether a given regression model is appropriate 
for the data at hand. We discuss this use in Chapter 3. 


Properties of Fitted Regression Line 


The estimated regression line (1.12) fitted by the method of least squares has a number of 
properties worth noting. These properties of the least squares estimated regression function 
do not apply to all regression models, as we shall see in Chapter 4. 


1. The sum of the residuals is zero: 


H 
e Уу е=0 (1.17) 
i=l 

Table 1.2, column 4, illustrates this property for the Toluca Company example. Rounding 
errors may, of course, be present in any particular case, resulting in a sum of the residuals 

that does not equal zero exactly. 
2. The sum of the squared residuals, у е2, is a minimum. This was the requirement to 
be satisfied in deriving the least squares estimators of the regression parameters since the 


24 Part Опе Simple Linear Regression 


criterion О in (1.8) to be minimized equals Y ^ €? when the least squares estimators Ро and 
b, are used for estimating Во and fj. 
3. The sum of the observed values Y; equals the sum of the fitted values Y: 


уйе Р (1.18) 
i=l i=l 


This property is illustrated in Table 1.2, columns 2 and 3, for the Toluca Company example. 
It follows that the mean of the fitted values Ў; is the same as the mean of the observed 
values Y;, namely, Y. 

4. The sum of the weighted residuals is zero when the residual in the ith trial is weighted 
by the level of the predictor variable in the ith trial: 


xe —0 (1.19) 
i=l 


5. A consequence of properties (1.17) and (1.19) 15 that the sum of the weighted residuals 
is zero when the residual in the ith trial is weighted by the fitted value of the response variable 
for the ith trial: 


У Pie; =0 (1.20) 
i=l 
6. The regression line always goes through the point (X, Y). 


Comment 


The six properties of the fitted regression line follow directly from the least squares normal equa- 
tions (1.9). For example, property 1 in (1.17) is proven as follows: 


ia = 30 be bX) = DK — nb - by X; 


=0 by the first normal equation (1.9а) 


Property 6, that the regression line always goes through the point (X, Y), can be demonstrated 
easily from the alternative form (1.15) of the estimated regression line. When X = X, we have: ` 


P=F+o(x-H=F4+4(%-H = Р ш 


1.7 Estimation of Error Terms Variance o? 


The variance c? of the error terms ¢; in regression model (1.1) needs to be estimated to 
obtain an indication of the variability of the probability distributions of Y. In addition, as 
we shall see in the next chapter, a variety of inferences concerning the regression function 
and the prediction of Y require an estimate of c?. 


Point Estimator of c? 
To lay the basis for developing an estimator of o? for regression model (1.1), we first 
consider the simpler problem of sampling from a single population. 


Single Population. We know that the variance c? of a single population is estimated by 
the sample variance s?. In obtaining the sample variance s?, we consider the deviation of 


Ехатр!е 


Chapter 1 Linear Regression with One Predictor Variable 25 


an observation Y; from the estimated mean Y, square it, and then sum all such squared 


deviations: 

2m ex 

i=l 
Such a sum is called a sum of squares. The sum of squares is then divided by the degrees 
of freedom associated with it. This number is n — 1 here, because one degree of freedom is 
lost by using Y as an estimate of the unknown population mean ш. The resulting estimator 
is the usual sample variance: 


; Die Р)? 


n—i 


S 


which is an unbiased estimator of the variance с? of an infinite population. The sample 
variance is often called a mean square, because a sum of squares has been divided by the 
appropriate number of degrees of freedom. 


Regression Model. The logic of developing an estimator of c? for the regression model is 
the same as for sampling from a single population. Recall in this connection from (1.4) that 
the variance of each observation Y; for regression model (1.1) is c?, the same as that of each 
error term ¢;. We again need to calculate a sum of squared deviations, but must recognize 
that the Y; now come from different probability distributions with different means that 
depend upon the level X;. Thus, the deviation of an observation Y; must be calculated 
around its own estimated mean Y;. Hence, the deviations are the residuals: 


Y; — Y i — €i 
and the appropriate sum of squares, denoted by SSE, is: 
n n 
55Е = У, #0) = уе (1.21) 
і=1 i=l 


where SSE stands for error sum of squares or residual sum of squares. 

The sum of squares SSE has n — 2 degrees of freedom associated with it. Two degrees 
of freedom are lost because both Во and £ had to be estimated іп obuating the estimated 
means Y;. Hence, the appropriate mean square, denoted by MSE or 52, is: 


SSE УХЕ – Ӯ) oDe 
п—2 n-2 n-2 


where MSE stands for error mean square or residual mean square. 
It can be shown that MSE is an unbiased estimator of o? for regression model (1. 1): 


E(MSE) = о? (1.23) 


An estimator of the standard deviation c is simply s = ./MSE, the positive square root of 
MSE. 


s? — MSE (1.22) 


We will calculate SSE for the ТоЈаса Company example" by (1.21). The residuals were 
obtained earlier in Table 1.2, column 4. This table also shows the squared residuals in 
column 5. From these results, we obtain: 


SSE — 54,825 


26 PartOne Simple Linear Regression 


Since 25 — 2 — 23 degrees of freedom are associated with SSE, we find: 


52 = MSE = 25 = 2,384 


Finally, а point estimate of с, the standard deviation of the probability distribution of Y for 
any X,1s s — 4/2,384 — 48.8 hours. 

Consider again the case where the lot size is X = 65 units. We found earlier that the 
mean of the probability distribution of Y for this lot size is estimated to be 294.4 hours. 
Now, we have the additional information that the standard deviation of this distribution is 
estimated to be 48.8 hours. This estimate is shown in the MINITAB output in Figure 1.11, 
labeled as s. We see that the variation in work hours from lot to lot for lots of 65 units is 
quite substantial (49 hours) compared to the mean of the distribution (294 hours). 


1.8 Normal Error Regression Model 


Model 


No matter what may be the form of the distribution of the error terms e; (and hence of the 
Y;), the least squares method provides unbiased point estimators of Во and 6, that have 
minimum variance among all unbiased linear estimators. То set up interval estimates and 
make tests, however, we need to make an assumption about the form of the distribution of 
the г;. The standard assumption is that the error terms e; are normally distributed, and we 
will adopt it here. A normal error term greatly simplifies the theory of regression analysis 
and, as we shall explain shortly, is justifiable in many real-world situations where regression 
analysis is applied. 


The normal error regression model is as follows: 
Y; = Bo + В.Х; + е; (1.24) 
where: 


Y; is the observed response in the ith trial 

X; is a known constant, the level of the predictor variable in the ith trial 
Во and В, are parameters 

є; are independent N (0, c?) 


i—1,...,n 


Comments 

1. The symbol N(0, c?) stands for normally distributed, with mean 0 and variance c?. 

2. The normal error model (1.24) is the same as regression model (1.1) with unspecified error 
distribution, except that model (1.24) assumes that the errors £; are normally distributed. 

3. Because regression model (1.24) assumes that the errors are normally distributed, the assump- 
tion of uncorrelatedness of the £; in regression model (1.1) becomes one of independence in the 
normal error model. Hence, the outcome in any one trial has no effect on the error term for any other 
trial—as to whether it is positive or negative, small or large. 


Chapter 1 Linear Regression with One Predictor Variable 27 


4. Regression model (1.24) implies that the Y; are independent normal random variables, with 
mean E(Y;) = Во + B, X; and variance c?. Figure 1.6 pictures this normal error model. Each of the 
probability distributions of Y in Figure 1.6 is normally distributed, with constant variability, and the 
regression function is linear. 

5. The normality assumption for the error terms is justifiable in many situations because the error 
terms frequently represent the effects of factors omitted from the model that affect the response to 
some extent and that vary at random without reference to the variable X. For instance, in the Toluca 
Company example, the effects of such factors as time lapse since the last production run, particular 
machines used, season of the year, and personnel employed could vary more or less at random from 
run to run, independent of lot size. Also, there might be random measurement errors in the recording 
of Y, the hours required. Insofar as these random effects have a degree of mutual independence, the 
composite error term &; representing all these factors would tend to comply with the central limit 
theorem and the error term distribution would approach normality as the number of factor effects 
becomes large. 

A second reason why the normality assumption of the error terms is frequently justiflable is that 
the estimation and testing procedures to be discussed in the next chapter are based on the t distribution 
and are usually only sensitive to large departures from normality. Thus, unless the departures from 
normality are serious, particularly with respect to skewness, the actual confidence coefficients and 
risks of errors will be close to the levels for exact normality. ш 


Estimation of Parameters by Method of Maximum Likelihood 


FIGURE 1.13 
Densities for 
Sample 
Observations 
for Two 
Possible Values 
of н: Y, = 250, 
Y; == 265, 

Үз == 259. 


When the functional form of the probability distribution of the error terms is specified, 
estimators of the parameters Во, £,, and o? can be obtained by the method of maximum 
likelihood. Essentially, the method of maximum likelihood chooses as estimates those values 
of the parameters that are most consistent with the sample data. We explain the method of 
maximum likelihood first for the simple case when a single population with one parameter 
is sampled. Then we explain this method for regression models. 


Single Population. Consider a normal population whose standard deviation is known 
to be c = 10 and whose mean is unknown. A random sample of п = 3 observations is 
selected from the population and yields the results Ү = 250, Y) = 265, Үз = 259. We 
now wish to ascertain which value of u is most consistent with the sample data. Consider 
и = 230. Figure 1.13a shows the normal distribution with и, = 230ando = 10; also shown 
there are the locations of the three sample observations. Note that the sample observations 


p 230 p = 259 
1 
КШ S 
230 i Y -259 Y 
4 
Y ^Y Y n 


(а) (b) 


28 PartOne Simple Linear Regression 


would be in the right tail of the distribution if и were equal to 230. Since these are unlikely 
occurrences, jz = 230 is not consistent with the sample data. 

Figure 1.13b shows the population and the locations of the sample data if u were equal 
to 259. Now the observations would be in the center of the distribution and much more 
likely. Hence, и = 259 is more consistent with the sample data than u = 230. 

The method of maximum likelihood uses the density of the probability distribution at 
Y; (i.e., the height of the curve at Y;) as a measure of consistency for the observation Y;. 
Consider observation Y; in our example. If Y, is in the tail, as in Figure 1.13a, the height of 
the curve will be small. If Y, is nearer to the center of the distribution, as in Figure 1.13b, 
the height will be larger. Using the density function for a normal probability distribution 
in (A.34) in Appendix A, we find the densities for Y;, denoted by fi, for the two cases of 


и in Figure 1.13 as follows: x 
1 1 / 250 — 230 | 
= 230: = exp| — — = .005399 
5 I= Tra | 2 ( 10 ) 
1 1 (250 — 2594? 
= 259: = ———— exp} —-[—————} | = .026609 
j ^ = Trao | ;( 10 ) | 


The densities for all three sample observations for the two cases of u are as follows: 


p = 230 p = 259 
f, .005399 .026609 


f .000087 .033322 
f .000595 .039894 


The method of maximum likelihood uses the product of the densities (i.e., here, the 
product of the three heights) as the measure of consistency of the parameter value with 
the sample data. The product is called the likelihood value of the parameter value u and 
is denoted by Г. (џи). If the value of u is consistent with the sample data, the densities will 
be relatively large and so will be the product (1.е., the likelihood value). If the value of u 
is not consistent with the data, the densities will be small and the product L(y) will be 
small. 

For our simple example, the likelihood values are as follows for the two cases of ju: 


L(u = 230) = .005399(.000087) (.000595) = .279x 107° 
L(u = 259) = .026609(.033322)(.039894) = .0000354 


Since the likelihood value L(y = 230) is a very small number, it is shown in scientific 
notation, which indicates that there are nine zeros after the decimal place before 279. Note 
that L(u = 230) is much smaller than L(y = 259), indicating that u = 259 is much more 
consistent with the sample data than u = 230. 

The method of maximum likelihood chooses as the maximum likelihood estimate that 
value of u for which the likelihood value is largest. Just as for the method of least squares, 


FIGURE 1.14 
Likelihood 
Function for 
Estimation of 
Mean of 
Normal 
Population: 

Yi == 250, 


Chapter 1 Linear Regression with One Predictor Variable 29 


there are two methods of finding maximum likelihood estimates: by a systematic numerical 
search and by use of an analytical solution. For some problems, analytical solutions for the 
maximum likelihood estimators are available. For others, a computerized numerical search 
must be conducted. 

For our example, an analytical solution is available. It can be shown that for a normal 
population the maximum likelihood estimator of и. is the sample mean Ӯ. In our example, 
Y = 258 and the maximum likelihood estimate of u therefore is 258. The likelihood value 
of и = 258 is L(u = 258) = .0000359, which is slightly larger than the likelihood value 
of .0000354 for и. = 259 that we had calculated earlier. 

The product of the densities viewed as a function of the unknown parameters is called 
the likelihood function. For our example, where с = 10, the likelihood function is: 


w- [ss] AS) Io 1S) 
= xao] "| 2\ 10 PTEN I Уу 


1/259— u\? 
sa-a 0) 
Figure 1.14 shows a computer plot of the likelihood function for our example. It is based 
on the calculation of likelihood values L (u) for many values of џи. Note that the likelihood 
values at u = 230 and u = 259 correspond to the ones we determined earlier. Also note 
that the likelihood function reaches a maximum at u = 258. 

The fact that the likelihood function in Figure 1.14 15 relatively peaked in the neigh- 
borhood of the maximum likelihood estimate Y = 258 is of particular interest. Note, for 
instance, that for u — 250 or u = 266, the likelihood value is already only a little more 
than one-half as large as the likelihood value at и —258. This indicates that the max- 
imum likelihood estimate here is relatively precise because values of u not near the maxi- 
mum likelihood estimate Y — 258 are much less consistent with the sample data. When the 
likelihood function is relatively flat in a fairly wide region around the maximum likelihood 


0.00004 
0.00003 

S 0.00002 : 

- 


0.00001 


0.00000 
220 230 240 250 260 270 280 290 300 


ш 


30 PartOne Simple Linear Regression 


estimate, many values of the parameter are almost as consistent with the sample data as the 
maximum likelihood estimate, and the maximum likelihood estimate would therefore be 
relatively imprecise. 


Regression Model. The concepts just presented for maximum likelihood estimation of 
a population mean carry over directly to the estimation of the parameters of normal error 
regression model (1.24). For this model, each Y; observation is normally distributed with 
mean Во + В. X; and standard deviation с. To illustrate the method of maximum likelihood 
estimation here, consider the earlier persistence study example on page 15. For simplicity, 
let us suppose that we know o = 2.5. We wish to determine the likelihood value for the 
parameter values Во = 0 and В = .5. For subject 1, X, = 20 and hence the mean of the 
probability distribution would be Во + В.Х; = 0+ .5(20) = 10.0. Figure 1.15a shows 
the normal distribution with mean 10.0 and standard deviation 2.5. Note that the observed 
value Y, = 5 is in the left tail of the distribution and that the density there is relatively small. 
For the second subject, X; = 55 and hence fo + В. X? = 27.5. The normal distribution with 
mean 27.5 is shown in Figure 1.15b. Note that the observed value Y? = 12 is most unlikely 
for this case and that the density there is extremely small. Finally, note that the observed 
value Үз = 10 is also in the left tail of its distribution if Во = О and f = .5, as shown in 
Figure 1.15c, and that the density there is also relatively small. 


FIGURE 1.15 Densities for Sample Observations if f = 0 and f, = 5—Persistence Study Example. 
(a) (b) (o 
X = 20, ү = 5 X7 55, Y = 12 X, = 30, Үз= 10 
Во + B4X1 = .5(20) = 10 Во + BiX2 = .5(55) = 27.5 Во + В1Хз = .5(30) = 15 


И АЧИН АЧА УЧ 


(d) Combined Presentation 


= 27.5 


10 20 30 40 50 60 X 


Chapter 1 Linear Regression with One Predictor Variable 31 


Figure 1.15d combines all of this information, showing the regression function E{Y} = 
0 + .5X, the three sample cases, and the three normal distributions. Note how poorly the 
regression line fits the three sample cases, as was also indicated by the three small density 
values. Thus, it appears that Во = 0 and f, = .5 are not consistent with the data. 

We calculate the densities (i.e., heights of the curve) in the usual way. For Y, — 5, 
X, = 20, the normal density is as follows when Во = 0 and f, = .5: 


r 


1 1 /5—10.0\* 
p= ma- 1 (22°) | = 021896 


The other densities are f2 = .7175 x 107? and f; = .021596, and the likelihood value of 
Во = 0 and В, = .5 therefore is: 


L(Bo = 0, y = .5) = .021596(7175 x 1079) (021596) = .3346 x 10112 


In general, the density of an observation Y; for the normal error regression model (1.24) 
is as follows, utilizing the fact that E(Y;) = Во + В.Х; and o?(Y;) = o?: 


2 
ene] zu (75e) | (1.25) 


f= 1 
яо 2 c 


The likelihood function for n observations У), Y2,..., Y, is the product of the individual 
densities in (1.25). Since the variance o? of the error terms is usually unknown, the likelihood 
function is a function of three parameters, Во, £i, and c?: 


n 


L(fo, В, о?) = | 


i=l 


1 1 
Qz o2) сј 20? (Yi Po вхо 


1 i =" 
= Оло?у"? | ~ 552 2% = foe pix? | (1.26) 


The values of Во, Bi, and c? that maximize this likelihood function are the maximum 
likelihood estimators and are denoted by Bo, B1, and 6, respectively. These estimators can 
be found analytically, and they are as follows: Ё 


Parameter — Maximum Likelihood Estimator 
Bo ‚ Bo=bo same as (1.106) 
B. Bi = Ы  sameas(1.10a) (1.27) 
o? f= DUROI 
n 


Thus, the maximum likelihood estimators of Во and f, are the same estimators as those 
provided by the method of least squares. The maximum likelihood estimator ó? is biased, 
and ordinarily the unbiased estimator MSE as given in (1.22) is used. Note that the unbi- 
ased estimator MSE or s? differs but slightly from the maximum likelihood estimator 6”, 


32 PartOne Simple Linear Regression 


Example 


especially if n is not small: 


52 = MSE = ó (1.28) 


For the persistence study example, we know now that the maximum likelihood estimates of 
Bo and B, are bg = 2.81 and b, = .177, the same as the least squares estimates in Figure 1.9b. 


Comments 


1. Since the maximum likelihood estimators Ёо and f, are the same as the least squares estimators 

bo and b,, they have the properties of all least squares estimators: 

a. They are unbiased. 

b. They have minimum variance among all unbiased linear estimators. 

In addition, the maximum likelihood estimators bo and b, for the normal error regression model 

(1.24) have other desirable properties: 

c. They are consistent, as defined in (A.52). 

d. They are sufficient, as defined in (A.53). 

e. They are minimum variance unbiased; that is, they have minimum variance in the class of all 
unbiased estimators (linear or otherwise). 

Thus, for the normal error model, the estimators bo and b, have many desirable properties. 

2. We find the values of Bo, f, and c? that maximize the likelihood function L in (1.26) by taking 
partial derivatives of L with respect to Bo, £j, and o”, equating each of the partials to zero, and 
solving the system of equations thus obtained. We can work with log, L, rather than L, because 
both L and log, L are maximized for the same values of Bo, £i, and o?: 


__п n. 3 1 2 
log, L = —log,2z — 516g, 0^ — 75 У (И — Bo — ВХ) (1.29) 


Partial differentiation of the logarithm of the likelihood function is much easier; it yields: 


9log, L I 
E 2 200 - &- AX) 
д (log, L I 
ED = т 80 -&- AX) 
d(log, L) n 1 
Luc c То Wisi Boer 


We now set these partial derivatives equal to zero, replacing Во, В, and c? by the estimators Bo, 
Bi, and 6?. We obtain, after some simplification: 


0 -Bo- ix) = 0 (1.30a) 


Уу Xa - ĝo- Bx) =0 (1.30b) 


Di — ĝo- В.Х? _ 


n 


6? (1.30c) 


Chapter 1 Linear Regression with One Predictor Variable 33 


Formulas (1.302) and (1.30b) are identical to the earlier least squares normal equations (1.9), and 
formula (1.30c) is the biased estimator of с? given earlier in (1.27). ш 


1.1. BMDP New System 2.0. Statistical Solutions, Inc. 
1.2. MINITAB Release 13. Minitab Inc. 


1.3. 
1.4. 
1.5. 


SAS/STAT Release 8.2. SAS Institute, Inc. 
SPSS 11.5 for Windows. SPSS Inc. 
SYSTAT 10.2. SYSTAT Software, Inc. 


1.6. JMP Version 5. SAS Institute, Inc. 


1.7. 
1.8. 


S-Plus 6 for Windows. Insightful Corporation. 
MATLAB 6.5. The MathWorks, Inc. 


1.1. 


1.2. 


1.3. 


1.4. 


1.5. 


1.6. 


1.7. 


1.8. 


1.9. 


Referto the sales volume example on page 3. Suppose that the number of units sold is measured 

accurately, but clerical errors are frequently made in determining the dollar sales. Would the 

relation between the number of units sold and dollar sales still be a functional one? Discuss. 

The members of a health spa pay annual membership dues of $300 plus a charge of $2 for each 

visit to the spa. Let Y denote the dollar cost for the year for a member and X the number of 

visits by the member during the year. Express the relation between X and Y mathematically. 

Is it a functional relation or a statistical relation? 

Experience with a certain type of plastic indicates that a relation exists between the hardness 

(measured in Brinell units) of items molded from the plastic (Y) and the elapsed time since ter- 

mination of the molding process (X). It is proposed to study this relation by means of regression 

analysis. A participant in the discussion objects, pointing out that the hardening of the plastic 

“is the result of a natural chemical process that doesn’t leave anything to chance, so the relation 

must be mathematical and regression analysis is not appropriate." Evaluate this objection. 

In Table 1.1, the lot size X is the same in production runs 1 and 24 but the work hours Y differ. 

What feature of regression model (1.1) is illustrated by this? 

When asked to state the simple linear regression model, a student wrote it as follows: E(Y;) — 

Po + В.Х; + є. Do you agree? 

Consider the normal error regression model (1.24). Suppose that the parameter value$ аге 

Bo = 200, B, = 5.0, and o = 4. ^ 

а. Plot this normal error regression model in the fashion of Figure 1.6. Show the distributions 
of Y for X — 10, 20, and 40. 

b. Explain the meaning of the parameters Во and fi. Assume that the scope of thé model 
includes X = 0. 

In a simulation exercise, regression model (1.1) applies with By = 100, 8; = 20, and o? = 25. 

An observation on Y will be made for X — 5. 

a. Can you state the exact probability that Y will fall between 195 and 205? Explain. 

b. If the normal error regression model (1.24) is applicable, can you now state the exact prob- 
ability that Y will fall between 195 and 205? If so, state it. 

In Figure 1.6, suppose another Y observation is obtained at X = 45. Would E(Y] for this new 

observation still be 104? Would the Y value for this new case again be 108? 

A student in accounting enthusiastically declared: "Regression is a very powerful tool. We can 

isolate fixed and variable costs by fitting a linear regression model, even when we have no data 

for small lots.” Discuss. 


M 
4 
Д 


34 PartOne Simple Linear Regression 


1.10. 


1.12. 


1.13. 


1.14. 


1.15. 


1.16. 


1.17. 


1.18. 


An analyst in a large corporation studied the relation between current annual salary (Y) and 
age (X) for the 46 computer programmers presently employed in the company. The analyst 
concluded that the relation is curvilinear, reaching a maximum at 47 years. Does this imply 
that the salary for a programmer increases until age 47 and then decreases? Explain. 


. The regression function relating production output by an employee after taking a training 


program (Y) to the production output before the training program (X) is E{Y} = 20+ .95X, 

where X ranges from 40 to 100. An observer concludes that the training program does not raise 

production output on the average because fj is not greater than 1.0. Comment. 

In a study of the relationship for senior citizens between physical activity and frequency of 

colds, participants were asked to monitor their weekly time spent in exercise over a five-year 

period and the frequency of colds. The study demonstrated that a negative statistical relation 

exists between time spent in exercise and frequency of colds. The investigator concluded that 

increasing the time spent in exercise is an effective strategy for reducing the frequency of colds 

for senior citizens. 

a. Were the data obtained in the study observational or experimental data? 

b. Comment on the validity of the conclusions reached by the investigator. 

c. Identify two or three other explanatory variables that might affect both the time spent in 
exercise and the frequency of colds for senior citizens simultaneously. 

d. How might the study be changed so that a valid conclusion about causal relationship between 
amount of exercise and frequency of colds can be reached? 

Computer programmers employed by a software developer were asked to participate in a month- 

long training seminar. During the seminar, each employee was asked to record the number of 

hours spent in class preparation each week. After completing the seminar, the productivity level 

of each participant was measured. A positive linear statistical relationship between participants’ 

productivity levels and time spent in class preparation was found. The seminar leader concluded 

that increases in employee productivity are caused by increased class preparation time. 

а. Were the data used by the seminar leader observational or experimental data? 

b. Comment on the validity of the conclusion reached by the seminar leader. 

c. Identify two or three alternative variables that might cause both the employee productivity 
Scores and the employee class participation times to increase (decrease) simultaneously. 

d. How might the study be changed so that a valid conclusion about causal relationship between 
class preparation time and employee productivity can be reached? 

Refer to Problem 1.3. Four different elapsed times since termination of the molding process 

(treatments) are to be studied to see how they affect the hardness of a plastic. Sixteen batches 

(experimental units) are available for the study. Each treatment is to be assigned to four exper- 

imental units selected at random. Use a table of random digits or a random number generator 

to make an appropriate randomization of assignments. 

The effects of five dose levels are to be studied in a completely randomized design, and 20 

experimental units are available. Each dose level is to be assigned to four experimental units 

selected at random. Use a table of random digits or a random number generator to make an 

appropriate randomization of assignments. 

Evaluate the following statement: “For the least squares method to be fully valid, it is required 

that the distribution of Y be normal.” 

A person states that bo and b; in the fitted regression function (1.13) can be estimated by the 

method of least squares. Comment. 

According to (1.17), У €; = 0 when regression model (1.1) is fitted to a set of п cases by the 

method of least squares. Is it also true that Y ^ e; = 0? Comment. 


1.19. 


*1.20. 


*1.21. 


Chapter 1 Linear Regression with One Predictor Variable 35 


Grade point average. The director of admissions of a small college selected 120 students at 
random from the new freshman class in a study to determine whether a student's grade point 
average (GPA) at the end of the freshman year (У) can be predicted from the ACT test score (X). 
The results of the study follow. Assume that first-order regression model (1.1) is appropriate. 


i: 1 2 3 ВРЕ 118 119 120 


Xr: 21 14 28 -— 28 16 28 
Yi: 3.897 3.885 3.778 ү 3.914 1.860 2.948 


a. Obtain the least squares estimates of Во and В;, and state the estimated regression function. 


b. Plot the estimated regression function and the data..Does the estimated regression function 
appear to fit the data well? 


c. Obtain a point estimate of the mean freshman GPA for students with ACT test score X = 30. 


d. What is the point estimate of the change in the mean response when the entrance test score 
increases by one point? 

Copier maintenance. The Tri-City Office Equipment Corporation sells an imported copier on 
a franchise basis and performs preventive maintenance and repair service on this copier. The 
data below have been collected from 45 recent calls on users to perform routine preventive 
maintenance service; for each call, X is the number of copiers serviced and Y is the total 
number of minutes spent by the service person. Assume that first-order regression model (1.1) 
is appropriate. 


i 1 2 3 iss 43 44 45 
X; 2 4 3 sis 2 4 5 
Yi: 20 60 46 ЕА 27 61 77 


a. Obtain the estimated regression function. i 
b. Plot the estimated regression function and the data. How well does the estimated regression 
function fit the data? 


c. Interpret bo in your estimated regression function. Does bo provide any relevant information 
here? Explain. 


d. Obtain a point estimate of the mean service time when X = 5 copiers are serviced. 


Airfreight breakage. A substance used in biological and medical research is shipped by air- 
freight to users in cartons of 1,000 ampules. The data below, involving 10 shipments, were 
collected on the number of times the carton was transferred from one aircraft to another over 
the shipment route ( X) and the number of ampules found to be broken upon arrival (Y). Assume 
that first-order regression model (1.1) is appropriate. 


i: 1 2 з. 4 5 6 7 8 9 10 
Xi: 1 0 2 0 3 1 0 1 2 0 
Yi: 16 9 17 12 22 13 8 15 19 11 


a. Obtain the estimated regression function. Plot the estimated regression function and the 
data. Does a linear regression function appear to give a good fit here? 

b. Obtain a point estimate of the expected number of broken ampules when X — 1 transfer is 
made. 


36 PartOne Simple Linear Regression 


1.22. 


1.23. 


*1.24. 


*1.25. 


1.26. 


*1.27. 


c. Estimate the increase in the expected number of ampules broken when there аге 2 transfers 
as compared to 1 transfer. 

d. Verify that your fitted regression line goes through the point (X, Y). 

Plastic hardness. Refer to Problems 1.3 and 1.14. Sixteen batches of the plastic were made, 
and from each batch one test item was molded. Each test item was randomly assigned to one of 
the four predetermined time levels, and the hardness was measured after the assigned elapsed 
time. The results are shown below; X is the elapsed time in hours, and Y is hardness in Brinell 
units. Assume that first-order regression model (1.1) is appropriate. 


i: 1 2 3 s 14 15 16 
Xi: 16 16 16 des 40 40 40 
Yr: 199 205 196 A 248 253 246 


a. Obtain the estimated regression function. Plot the estimated regression function and the 
data. Does a linear regression function appear to give a good fit here? 


b. Obtain a point estimate of the mean hardness when X — 40 hours. 

c. Obtain a point estimate of the change in mean hardness when X increases by 1 hour. 
Refer to Grade point average Problem 1.19. 

а. Obtain the residuals e;. Do they sum to zero in accord with (1.17)? 

b. Estimate c? and c. In what units is с expressed? 

Refer to Copier maintenance Problem 1.20. 


a. Obtain the residuals e; and the sum of the squared residuals У ^ e?. What is the relation 
between the sum of the squared residuals here and the quantity Q in (1.8)? 


b. Obtain point estimates of o? and с. In what units is с expressed? 
Refer to Airfreight breakage Problem 1.21. 

a. Obtain the residual for the first case. What is its relation to £1? 

b. Compute Уе? and MSE. What is estimated by MSE? 

Refer to Plastic hardness Problem 1.22. 

a. Obtain the residuals e;. Do they sum to zero in accord with (1.17)? 
b. Estimate c? and c. In what units is c expressed? 


Muscle mass. A person's muscle mass is expected to decrease with age. To explore this rela- 
tionship in women, a nutritionist randomly selected 15 women from each 10-year age group, 
beginning with age 40 and ending with age 79. The results follow; X is age, and Y is a measure 
of muscle mass. Assume that first-order regression model (1.1) is appropriate. 


i: 1 2 3 — 58 59 60 
Xr: 43 41 47 ae 76 72 76 
E 106 106 97 2 56 70 74 


а. Obtainthe estimated regression function. Plot the estimated regression function and the data. 
Does a linear regression function appear to give a good fit here? Does your plot support the 
anticipation that muscle mass decreases with age? 

b. Obtain the following: (1) a point estimate of the difference in the mean muscle mass for 
women differing in age by one year, (2) a point estimate of the mean muscle mass for women 
aged X = 60 years, (3) the value of the residual for the eighth case, (4) a point estimate of o?. 


Exercises 


1.28. 


1.29. 


1.30. 


1.31. 


1.32. 
1.33. 


1.34. 
1.35. 


1.36. 
1.37. 


1.38. 


1.39. 


Chapter 1 Linear Regression with One Predictor Variable 37 


Crime rate. A criminologist studying the relationship between level of education-and crime 
rate in medium-sized U.S. counties collected the following data for a random sample of 84 coun- 
ties; X is the percentage of individuals in the county having at least a high-school diploma, and 
Y is the crime rate (crimes reported per 100,000 residents) last year. Assume that first-order 
regression model (1.1) is appropriate. 


i: 1 2 3 sue 82 83 84 
Xr: 74 82 81 - 88 83 76 
Yi: 8,487 8,179 8,362 ene 8,040 6,981 ‚7,582 


a. Obtain the estimated regression function. Plot the estimated regression function and the 
data. Does the linear regression function appear to give a good fit here? Discuss. 

b. Obtain point estimates of the following: (1) the difference in the mean crime rate for two 
counties whose high-school graduation rates differ by one percentage point, (2) the mean 
crime rate last year in counties with high school graduation percentage X = 80, (3) £10, 
(4) о?. 


Refer to regression model (1.1). Assume that X = 0 is within the scope of the model. What is 
the implication for the regression function if Во = 0 so that the model is Y; = £i X; + €;? How 
would the regression function plot on a graph? 

Refer to regression model (1.1). What is the implication for the regression function if В, = 0 
so that the model is Y; = Во + £;? How would the regression function plot on a graph? 

Refer to Plastic hardness Problem 1.22. Suppose one test item was molded from a single 
batch of plastic and the hardness of this one item was measured at 16 different points in time. 
Would the error term in the regression model for this case still reflect the same effects as for 
the experiment initially described? Would you expect the error terms for the different points in 
time to be uncorrelated? Discuss. З 

Derive the expression for b; in (1.10a) from the normal equations in (1.9). 

(Calculus needed.) Кеѓег то the regression model Y; = Во + £; in Exercise 1.30. Derive the 
least squares estimator of Во for this model. 

Prove that the least squares estimator of By obtained in Exercise 1.33 is unbiased. 

Prove the result in (1.18) — that the sum of the Y observations is the same as the sum of the 
fitted values. 

Prove the result in (1.20) — that the sum of the residuals weighted by the fitted values is zero. 
Refer to Table 1.1 for the Toluca Company example. When asked to present a point estimate 
of the expected work hours for lot sizes of 30 pieces, a persón gave the estimate 202 because 
this is the mean number of work hours in the three runs of size 30 in the study. A critic states 
that this person's approach “throws away” most of the data in the study because cases with lot 
sizes other than 30 are ignored. Comment. 

In Airfreight breakage Problem 1.21, the least squares estimates are bo = 10.20 and b; = 4.00, 
and > е2 = 17.60. Evaluate the least squares criterion О in (1.8) for the estimates (1) bo — 9, 
bj =3; (2) bo = 11, b, = 5. Is the criterion Q larger for these estimates than for the least squares 
estimates? 


Two observations on Y were obtained at each of three X levels, namely, at X = 5, X = 10, and 

X = 15. 

a. Show that the least squares regression line fitted to the three points (5, Ӯ,), (10, Ӯ), and 
(15, Y), where Y;, №, and Ў, denote the means of the Y observations at the three X levels, 
is identical to the least squares regression line fitted to the original six cases. 


38 PartOne Simple Linear Regression 


1.40. 


1.41. 


1.42. 


b. In this study, could the error term variance c? be estimated without fitting a regression line? 
Explain. 

In fitting regression model (1.1), it was found that observation Y; fell directly on the fitted 

regression line (i.e., Y; — Ў,). If this case were deleted, would the least squares regression line 

fitted to the remaining п — 1 cases be changed? [Hint: What is the contribution of case i to the 

least squares criterion Q in (1.8)?] 

(Calculus needed.) Refer to the regression model Y; = 6, X; 4-£;, i = 1,..., n, in Exercise 1.29. 

a. Find the least squares estimator of |. 

b. Assume that the error terms £; are independent N (0, o?) and that с? is known. State the 
likelihood function for the n sample observations on Y and obtain the maximum likelihood 
estimator of £j. Is it the same as the least squares estimator? 

c. Show that the maximum likelihood estimator of Ві is unbiased. 


“ 
os 


Typographical errors. Shown below are the number of galleys for a manuscript (X) and the 
dollar cost of correcting typographical errors (Y ) in a random sample of recent orders handled by 
a firm specializing in technical manuscripts. Assume that the regression model Y; = 6, X; + £i 
is appropriate, with normally distributed independent error terms whose variance is c? = 16. 


i: 1 2 3 4 5 6 
Xi: 7 12 4 14 25 30 
Yi: 128 213 75 250 446 540 


a. State the likelihood function for the six Y observations, for с? = 16. : 

b. Evaluate the likelihood function for В; = 17, 18, and 19. For which of these Ву values is 
the likelihood function largest? 

c. The maximum likelihood estimator is bj = Y ^ X; Y;/ У X7. Find the maximum likelihood 
estimate. Are your results in part (b) consistent with this estimate? 

d. Using а computer graphics or statistics package, evaluate the likelihood function for values 
of Ву between £j = 17 and f, = 19 and plot the function. Does the point at which the 
likelihood function is maximized correspond to the maximum likelihood estimate found in 
part (c)? 


Projects 


1.43. 


Refer to the CDI data set in Appendix C.2. The number of active physicians in a CDI (Y) is 

expected to be related to total population, number of hospita] beds, and total personal income. 

Assume that first-order regression model (1.1) is appropriate for each of the three predictor 

variables. 

a. Regress the number of active physicians in turn on each of the three predictor variables. 
State the estimated regression functions. 

b. Plot the three estimated regression functions and data on separate graphs. Does a linear 
regression relation appear to provide a good fit for each of the three predictor variables? 

c. Calculate MSE for each of the three predictor variables. Which predictor variable leads to 
the smallest variability around the fitted regression line? 


1.44. Refer to the CDI data set in Appendix C.2. 


a. For each geographic region, regress per capita income m a CDI (Y) against the per- 
centage of individuals in a county having at least a bachelor's degree (X). Assume that 


b. 
c. 


Chapter Linear Regression with One Predictor Variable 39 


first-order regression model (1.1) is appropriate foreach region. State the estimated regres- 
sion functions. 


Are the estimated regression functions similar for the four regions? Discuss. 


Calculate MSE for each region. Is the variability around the fitted regression line approxi- 
mately the same for the four regions? Discuss. 


1.45. Refer to the SENIC data set in Appendix C.1. The average length of stay in a hospital (Y) is 
anticipated to be related to infection risk, available facilities and services, and routine chest 
X-ray ratio. Assume that first-order regression model (1.1) is appropriate for each of the three 


1.46. 


1.47. 


predictor variables. + 


а. 


f 


Regress average length of stay on each of the three predictor variables. State the estimated 
regression functions. 


. Plot the three estimated regression functions and data on separate graphs. Does a linear 


relation appear to provide a good fit for each of the three predictor variables? 


. Calculate MSE for each of the three predictor variables. Which predictor variaple leads to 


the smallest variability around the fitted regression line? 


Refer to the SENIC data set in Appendix C.1. 


a. 


For each geographic region, regress average length of stay in hospital (Y) against infection 
risk (X). Assume that first-order regression model (1.1) is appropriate for each region. State 
the estimated regression functions. 

Are the estimated regression functions similar for the four regions? Discuss. 

Calculate MSE for each region. Is the variability around the fitted regression line approxi- 
mately the same for the four regions? Discuss. 


Refer to Typographical errors Problem 1.42. Assume that first-order regression model (1.1) 
is appropriate, with normally distributed independent error terms whose variance is c? = 16. 


a. 
b. 
c. 


State the likelihood function for the six observations, for o? — 16. 

Obtain the maximum likelihood estimates of Bp and В;, using (1.27). 

Using a computer graphics-or statistics package, obtain a three-dimensional plot of the 
likelihood function for values of Во between Во = —10 and By = 10 and for values of 
В between В = 17 and В = 19. Does the likelihood appear to be maximized by the 
maximum likelihood estimates found in part (b)? 


Chapter 


Inferences in Regression 
and Correlation Analysis 


In this chapter, we first take up inferences concerning. ће regression parameters Во and 
Bı, considering both interval estimation of these parameters and tests about them. We then 
discuss interval estimation of the mean E{Y} of the probability distribution of Y, for given 
X, prediction intervals for a new observation Y, confidence bands for the regression line, 
the analysis of variance approach to regression analysis, the general linear test approach, 
and descriptive measures of association. Finally, we take up the correlation coefficient, a 
measure of association between X and Y when both X and Y are random variables. 

Throughout this chapter (excluding Section 2.11), and in the remainder of Part I unless 
otherwise stated, we assume that the normal error regression model (1.24) is applicable. 
This model is: 


Y; = Bo + В.Х; + & (2.1) 
where: 
Во and f, are parameters 


X; are known constants 
є; are independent N (0, c?) 


2.1 Inferences Concerning ffi 


40 


Frequently, we are interested in drawing inferences about В, the slope of the regression 
line in model (2.1). For instance, a market research analyst studying the relation between 
sales (Y) and advertising expenditures (X) may wish to obtain an interval estimate of Ві 
because it will provide information as to how many additional sales dollars, on the average, 
are generated by an additional dollar of advertising expenditure. 

At times, tests concerning 8, are of interest, particularly one of the form: 


Ho: P =0 
Ha: Bi Ф 0 


FIGURE 2.1 
Regression 
Model (2.1) 


when Ё; = 0. 


Chapter 2 Inferences in Regression and Correlation Analysis 41 


The reason for interest in testing whether or not В = 0 is that, when В; = 0, there is no 
linear association between Y and X. Figure 2.1 illustrates the case when f, = 0 Note that 
the regression line is horizontal and that the means of the probability distributions of Y are 
therefore all equal, namely: 


E{Y} = fo + (0)X = Bo 


For normal error regression model (2.1), the condition В, = 0 implies even more than 
no linear association between Y and X. Since for this model all probability distributions of 
Y are normal with constant variance, and since the means are equal when В; = 0, it follows 
that the probability distributions of Y are identical when В, = 0. This is shown in Figure 2.1. 
Thus, В, = 0 for the normal error regression model (2.1) implies not only that there is no 
linear association between Y and X but also that there is no relation of any type between 
Y and X, since the probability distributions of Y are then identical at all levels of X. 

Before discussing inferences concerning В. further, we need to consider ће sampling 
distribution of b,, the point estimator of В]. 


Sampling Distribution of b; 


The point estimator b, was given in (1.102) as follows: 


У(Х; - X) — Y) 
XU Xy 
The sampling distribution of b, refers to the different values of b, that would be obtained 


with repeated sampling when the levels of the predictor variable X are held constant from 
sample to sample. 


bi = А (2.2) 


i 


For normal error régression model (2.1), the sampling distribution 


of b, is normal, with mean and variance: (2.3) 
E{bı} = Bi , (2.32) 
2 о? b 


To show this, we need to recognize that b, is a linear combination of the observations Y;. 


42 PartOne Simple Linear Regression 


b, as Linear Combination of the Y;. It can be shown that bı, as defined in (2.2), can be 
expressed as follows: 


bi — kY; (2.4) 
where: 
X,- X 
S arc dy ы) 


Observe that the k; are a function of the X; and therefore are fixed quantities since the X; 
are fixed. Hence, b, is a linear combination of the Y; where the coefficietits are solely a 


function of the fixed X;. 
The coefficients k; have a number of interesting properties that will be used later: 
Sok =0 | (2.5) 
kx; =1 (2.6) 
2 = 
Куа 2.7 
Comments 
1. To show that b; is a linear combination of the Y; with coefficients k,, we first prove: 
SG - HM - Y 2 o5 - DY; (2.8) 


This follows since: 


0 - Ha - Y = 305 - 3n - M00 - HYP 


But У(Х; — X)Y = Y Y (X; — X) =0 since $ (X; — X) = 0, Hence, (2.8) holds. 
We now express b, using (2.8) and (2.4a): 


_ XQ -X09;-Y) EA- «wv, 
x Da- УХХ, - Xy 0287 


2. The proofs of the properties of the k; are direct. For example, property (2.5) follows because: 


X;—X s. 1 à YN 0 = 
УУ [scm = LO yo 


Similarly, property (2.7) follows because: 


Xi —X ? 2 
Ук E y E Zy] = m - iy] + 9 0 — Ху = TG -ý 


Normality. We return now to the sampling distribution of Р, for the normal error regres- 
sion model (2.1). The normality of the sampling distribution of Р, follows at once from the 
fact that b, is a linear combination of the Y;. The Y; are independently, normally distributed 


Chapter 2  Inferences in Regression and Correlation Analysis АЗ 
according to model (2.1), and (A.40) in Appendix A states that a linear combination of 
independent normal random variables is normally distributed. 


Mean. The unbiasedness of the point estimator b,, stated earlier in the Gauss-Markov 
theorem (1.11), is easy to show: 


E(b) = ЕГУ kY} = Y &E( = Y Go XO 
— fo y» + Bi Ух; 
By (2.5) and (2.6), we then obtain E {bı} = fi. 


Variance. The variance of Ру can be derived readily. We need only remember that the 
Y; are independent random variables, each with variance c?, and that the k; are constants. 
Hence, we obtain by (A.31): b 


оь) = (Y Y) = у Ror) 
-M k? =07 Уух 
2 


1 
id УХХ, - Xy 
The last step follows from (2.7). 


Estimated Variance. We can estimate the variance of the sampling distribution of b;: 
2 


2 c 
bi) == SSS 
c {bi} $6 - Xy 
by replacing the parameter o? with MSE, the unbiased estimator of c?: 
MSE 
?(by) = = 2.9 
s'tbi) SX Xy (2.9) 


The point estimator s? (b, ) is an unbiased estimator of c? {b1}. Taking the positive square 
root, we obtain s(b;), the point estimator of c (b,). 


Comment 


We stated in theorem (1.11) that b; has minimum variance among all unbiased linear estimators of 


the form: 
i 
Ў = Уак 


where the c; are arbitrary constants. We now prove this. Since f, is required to be unbiased, the 


following must hold: : 2 
Ei) = E(Y ar] = 3 jason = в 


Now Е{Ү;} = Bo + 81X: by (1.2), so the above condition becomes: 


EÊ) = Y oo iX) = бо Уо +в Усх = Bi 


44 PartOne Simple Linear Regression 


For the unbiasedness condition to hold, the c; must follow the restrictions: 


SoG =0 axi 


Now the variance of f, is, by (A.31): 


cfi = V detto d 


Let us define c; = k; 4- d;, where the k; are the least squares constants in (2.42) and the d; are arbitrary 
constants. We can then write: 


0181) о? У ср = о td) o P» +04? +2) kids) 


We know that о? У k? = o?(b,) from our proof above. Further, У) Kid; = 0 because of the restrictions 
on the k; and c; above: 


Ука = Ука —k) 
= У`сһ-У k 
X;— X 1 
= 2. 5 = = УХХ; — Xy 


|2OXex-XYa | 1 E 
(0 ухх – Х) УХХ: — XP? 


Hence, we have: 
o {Bi} = o* {bi} 4o? уар 


Note that the smallest value of У `42 is zero. Hence, the variance of f, is at a minimum when 
Sd? = 0. But this can only occur if all d; = 0, which implies c; = k;. Thus, the least squares 
estimator b; has minimum variance among all unbiased linear estimators. n" 


Sampling Distribution of (b — 81) /s(bi] 
Since b, is normally distributed, we know that the standardized statistic (bj — B1)/o {b1} 
is a standard normal variable. Ordinarily, of course, we need to estimate o (bi) by s{by}, 
and hence are interested in the distribution of the statistic (b, — P1)/s(bi]. When a statistic 
is standardized but the denominator is an estimated standard deviation rather than the true 
standard deviation, it is called a studentized statistic. An important theorem in statistics 
states the following about the studentized statistic (bj — £1)/s(bi): 


bi — Bi 
s(bi) 


is distributed as 1 (n — 2) for regression model (2.1) (2.10) 


Intuitively, this result should not be unexpected. We know that if the observations Y; 
come from the same normal population, (Y — и)/5{Ў} follows the t distribution with n — 1 
degrees of freedom. The estimator by, like Y, is a linear combination of the observations Y;. 
The reason for the difference in the degrees of freedom is that two parameters (Во and fi) 
need to be estimated for the regression model; hence, two degrees of freedom are lost here. 


Chapter 2 Inferences in Regression and Correlation Analysis 45 


Comment 
We can show that the studentized statistic (b, — £1)/s(b) is distributed as t with n — 2 degrees of 
freedom by relying on the following theorem: 

For regression model (2.1), SSE/o? is distributed as X? with n — 2 


? е 2.11 
degrees of freedom and is independent of bo and by. ( ) 
First, let us rewrite (bı — £1)/s(bi) as follows: 


b—Bi . sibi) 
cfb} ` cíb]) 


The numerator is a standard normal variable z. The nature of the denominator can be seen by first 


considering: 
MSE SSE 
s?(bi) УХХ; = Xy MSE n—2 
ob) c c sg b 
УХХ; — Xy 


(8E | x*(n—2) 
^ e?(n —2) n—2 
where the symbol ~ stands for "is distributed as.” The last step follows from (2.11). Hence, we have: 
b, В z 


s(bi) Геп —2) 
| n—2 


But by theorem (2.11), z and x? are independent since z is a function of b, and b; is independent of 
SSE/o? ~ x?. Hence, by (A.44), it follows that: 


bi В 
— — ^ t(n —2) 
s(bi) 
This result places us in a position to readily make inferences concerning fi. " 


Confidence Interval for £: 
Since (b, — B,)/s{by} follows a t distribution, we can make the following probability 
statement: 


P(t(a/2;n — 2) < (b — B0/sibi) xt(1—a/2;:n -2) 21-e (2.12) 


Here, (0/2; n — 2) denotes the (0/2)100 percentile of the t distribution with n — 2 degrees 
of freedom. Because of the symmetry of the distribution around its mean 0, it follows that: 


t(a/2; n —2) = —t (1 —a/2;n – 2) (2.13) 
Rearranging the inequalities in (2.12) and using (2.13), we obtain: 
P(b, -t(1 о/2;п —2)s(bi) < i < bi tQ 0/2; — 2у5{Ь}} = 1-0 
` А (2.14) 
Since (2.14) holds for all possible values of Ву, the 1 — o confidence limits for В are: 


bi x t(1 — o/2; n —2)s(bi) (2.15) 


46 PartOne Simple Linear Regression 


Example 


TABLE 2.1 
Results for 
Toluca 
Company 
Example 
Obtained in 
Chapter 1. 


FIGURE 2.2 
Portion of 
MINITAB 
Regression 
Output— 
Toluca 
Company 
Example. 


Consider the Toluca Company example of Chapter 1. Management wishes an estimate of 
Pi with 95 percent confidence coefficient. We summarize in Table 2.1 the needed results 
obtained earlier. First, we need to obtain s{b,}: 


MSE 2,384 
?(b,) = — = = .12040 
s (b) = ұу уу 19800 

s{b,} = .3470 


This estimated standard deviation is shown in the MINITAB output in Figure 2.2 in the 
column labeled Stdev corresponding to the row labeled X. Figure 2.2 repeats the MINITAB 
output presented earlier in Chapter 1 and contains some additional results that we will utilize 
shortly. » 

For a 95 percent confidence coefficient, we require 1 (.975; 23). From Table B.Z in Ap- 
pendix B, we find £(.975; 23) = 2.069. The 95 percent confidence interval, by (2.15), then is: 


3.5702 — 2.069(.3470) < B, < 3.5702 + 2.069(.3470) 
2.85 < By < 4.29 


Thus, with confidence coefficient .95, we estimate that the mean number of work hours 
increases by somewhere between 2.85 and 4.29 hours for each additional unit in the lot. 


Comment 


In Chapter 1, we noted that the scope of a regression model is restricted ordinarily to some range of 
values of the predictor variable. This is particularly important to keep in mind in using estimates of 
the slope £;. In our Toluca Company example, a linear regression model appeared appropriate for 
lot sizes between 20 and 120, the range of the predictor variable in the recent past. It may not be 


п= 25 Х = 70.00 
bo = 62.37 bı-= 3.5702 
Y.— 62.37 + 3.5702X SSE = 54,825 
УХХ; — Xy. = 19,800 MSE = 2,384 


PY: Ӯ) = 307,203 


The regression equation is 
Y = 62.4 + 3.57 X 


Predictor Coef Stdev t-ratio р 
Constant 62.37 26.18 2.38 0.026 
X 3.5702 0.3470 10.29 0.000 
в = 48.82 R-sq = 82.24 R-sq(adj) = 81.4% 


Analysis of Variance 


SOURCE DF SS MS F р 
Regression 1 252378 252378 105.88 0.000 
Error 23 54825 2384 


Total 24 307203 


Tests Concerning £f; И 


Ехатріе 1 


Example 2 


Chapter 2  Inferences in Regression and Correlation Analysis 47 


reasonable to use the estimate of the slope to infer the effect of lot size on number of work hours far 
outside this range since the regression relation may not be linear there. " 


Y 


Since (b, — Pi)/s(bi) is distributed as t with  — 2 degrees of freedom, tests concerning 
Bı сап be set up in ordinary fashion using the г distribution. 


Two-Sided Test A cost analyst in the Toluca Company is interested in testing, using 
regression model (2.1), whether or not there is a linear association between work hours and 
lot size, i.e., whether or not Ву = 0. The two alternatives then are: 
Ho: Bi =0 
Ha: Ву #0 
The analyst wishes to control the risk of a Type I error ata = .05. The conclusion H, could 
be reached at once by referring to the 95 percent confidence interval for Ву constructed 
earlier, since this interval does not include 0. 
An explicit test of the alternatives (2.16) is based on the test statistic: 
b 
pic 
sibi) 
The decision rule with this test statistic for controlling the level of significance at o is: 
If |¢*| x t(1 — &/2; n — 2), conclude Ho 
If |t*| > £(1 — «/2; n — 2), conclude Н, 

For the Toluca Company example, where о = .05, b, = 3.5702, and s(b,) = .3470, we 

require t (.975; 23) = 2.069. Thus, the decision rule for testing alternatives (2.16) is: 

If |t*| < 2.069, conclude Ho 

If |t*| > 2.069, conclude H, 
Since |t*| = |3.5702/.3470| = 10.29 > 2.069, we conclude H,, that В, 5 0 or that 
there is a linear association between work hours and lot size. The value of the test statistic, 
t* = 10.29, is shown in the MINITAB output in Figure 2.2 in the column labeled t-ratio 
and the row labeled X. 

The two-sided P-value for the sample outcome is obtained by first finding the one- 
sided P-value, P{t(23) > t* = 10.29}. We see from Table B.2 that this probability is 
less than .0005. Many statistical calculators and computer packages will provide the actual 
probability; it is almost 0, denoted by 0+. Thus, the two-sided P-value is 2(0+) = 0+. 
Since the two-sided P-value is less than the specified level of significance а = .05, we 
could conclude H, directly. The MINITAB output in Figure 2.2 shows the P-value in the 
column labeled p, corresponding to the row labeled X. It is shown as 0.000. 


(2.16) 


(2.17) 


(2.18) 


Comment 
When the test of whether or not 6; = 0 leads to the conclusion that В; 3 0, the association between 
Y and X is sometimes described to be a linear statistical association. ^ ш 


One-Sided Test Suppose the analyst had wished to test whether or not f is positive, 
controlling the level of significance at о = .05. The alternatives then would be: 


Ho: В < 0 
H: Ву > 0 


48 PartOne Simple Linear Regression 


and the decision rule based on test statistic (2.17) would be: 


If t* < t(1 — o; n — 2), conclude Но 
If t* > t(1 — o;n — 2), conclude H, 


Fora = .05, we require t (.95; 23) = 1.714. Since t* = 10.29 > 1.714, we would conclude 
Ha, that P, is positive. 

This same conclusion could be reached directly from the one-sided P-value, which was 
noted in Example 1 to be 0+. Since this P-value is less than .05, we would conclude H,. 


Comments 


1. The P-value is sometimes called the observed level of significance. 


m^ 
— 


2. Many scientific publications commonly report the P-value together with the value of the test 
statistic. In this way, one can conduct a test at any desired level of significance œ by comparing the 
P-value with the specified level a. 

3. Users of statistical calculators and computer packages need to be careful to ascertain whether 
one-sided or two-sided P-values are reported. Many commonly used labels, such as PROB or P, do 
not reveal whether the P-value is one- or two-sided. 

4. Occasionally, it is desired to test whether or not Ву equals some specified nonzero value Bio, 
which may be a historical norm, the value for a comparable process, or an engineering specification. 
The alternatives now are: 


Ho: By = Bio (2.19) 
Ha: By # Bio 
and the appropriate test statistic is: 
bi — Bio 
*- 2.20 
sii) m) 


The decision rule to be employed here still is (2.18), but it is now based on г* defined in (2.20). 
Note that test statistic (2.20) simplifies to test statistic (2.17) when the test involves Ho: f; = 
Bio = 0. a 


2.2 Inferences Concerning Bp . 


As noted in Chapter 1, there are only infrequent occasions when we wish to make inferences 
concerning Во, the intercept of the regression line. These occur when the scope of the model 
includes X = 0. 


Sampling Distribution of bo 
The point estimator bọ was given in (1.10b) as follows: 
bo = ¥ -bı X (2.21) 


The sampling distribution of bo refers to the different values of bo that would be obtained 
with repeated sampling when the levels of the predictor variable X are held constant from 


Chapter2 Inferences in Regression and Correlation Analysis 49 


sample to sample. 


For regression model (2.1), the sampling distribution of bo 


is normal, with mean and variance: 9 (2.22) 
Ebo) = Bo (2.22а) 
c? (by) = о? E t A - | (2.22b) 

п YX; — Xy 


The normality of the sampling distribution of bo follows because bo, like Б, is a linear 
combination of the observations Y;. The results for the mean and variance of the sampling 
distribution of bọ can be obtained in similar fashion as those for b,. 

An estimator of c? (bo) is obtained by replacing c? by its point estimator MSE: 


sus) = MSE | + x: - | (2.23) 
п УХХ – Xy b 


The positive square root, s {bo}, is an estimator of o {bo}. 
Sampling Distribution of (bo — Bo) /s(bo] 
Analogous to theorem (2.10) for Б, a theorem for bo states: 
bo — Bo 
{bo} 


Hence, confidence intervals for Во and tests concerning Во can be set up in ordinary fashion, 
using the 7 distribution. 


is distributed as t (n — 2) for regression model (2.1) (2.24) 


Confidence Interval for Bo 


The 1 — o confidence limits for Во are obtained in the same manner as those for p, derived 
earlier. They are: 


by E t(1 0/2; n — 2)s{bo} (2.25) 


As noted earlier, the scope of the model for the Toluca Company example does not extend to 
lot sizes of X = 0. Hence, the regression parameter Во may not have intrinsic meaning here. 
If, nevertheless, a 90 percent confidence interval for By were desired, we would proceed by 
finding t (.95; 23) and s{bo}. From Table B.2, we find t(.95; 23) = 1.714. Using the earlier 
results summarized in Table 2.1, we obtain by (2.23): 


Example 


1 x 1 0.00? 
s*{bo} = MSE E + sx — 2,384 E oe | = 685.34 


п УХХ; – Xy 25 19,800 
ог: ? 
s{bo} = 26.18 
The MINITAB output in Figure 2.2 shows this estimated standard deviation in the column 
labeled Stdev and the row labeled Constant. á 


The 90 percent confidence interval for Во is: 


62.37 — 1.714(26.18) < Bo x 62.37 + 1.714(26.18) 
17.5 < By < 107.2 


50 Part Ohe Simple Linear Regression 


We caution again that this confidence interval does not necessarily provide meaningful 
information. For instance, it does not necessarily provide information about the "setup" 
cost (the cost incurred in setting up the production process for the part) since we are not 
certain whether a linear regression model is appropriate when the scope of the model is 
extended to X — 0. 


2.3 Some Considerations on Making Inferences Concerning 


fo and В! 


Effects of Departures from Normality A 

If the probability distributions of Y are not exactly normal but do not depart seriously, 
the sampling distributions of Ро and b, will be approximately normal, and the use of the 
t distribution will provide approximately the specified confidence coefficient or level of 
significance. Even if the distributions of Y are far from normal, the estimators bo and Р, 
generally have the property of asymptotic normality—their distributions approach normality 
under very general conditions as the sample size increases. Thus, with sufficiently large 
samples, the confidence intervals and decision rules given earlier still apply even if the 
probability distributions of Y depart far from normality. For large samples, the t value is, 
of course, replaced by the z value for the standard normal distribution. 


Interpretation of Confidence Coefficient and Risks of Errors 

Since regression model (2.1) assumes that the X; are known constants, the. confidence 
coefficient and risks of errors are interpreted with respect to taking repeated samples in 
which the X observations are kept at the same levels as in the observed sample. For instance, 
we constructed a confidence interval for В with confidence coefficient .95 in the Toluca 
Company example. This coefficient is interpreted to mean that if many independent samples 
are taken where the levels of X (the lot sizes) are the same as in the data set and a 95 percent 
confidence interval is constructed for each sample, 95 percent of the intervals will contain 
the true value of fi. 


Spacing of the X Levels 
Inspection of formulas (2.3b) and (2.22b) for the variances of b, and bo, respectively, 
indicates that for given n and c? these variances are affected by the spacing of the X 
levels in the observed data. For example, the greater is the spread in the X levels, the larger 
is the quantity У(Х, — X)? and the smaller is the variance of b. We discuss in Chapter 4 
how the X observations should be spaced in experiments where spacing can be controlled. 


Power of Tests 


The power of tests on Во and f, can be obtained from Appendix Table B.5. Consider, for 
example, the general test concerning f, in (2.19): 


Ho: Bi = Bio 
Ha: By # Bio 


Chapter 2 Inferences in Regression and Correlation Analysis 51 


for which test statistic (2.20) 1s employed: 


fa by — Bio 
s(bij 


and the decision rule for level of significance o is given in (2.18): 


If |¢*| < £(1 — a@/2;n — 2), conclude Ho 
If |2 > £(1 — 0/2; п — 2), conclude H, 


The power of this testis the probability that the decision rule will lead to conclusion H, 
when H, in fact holds. Specifically, the power is given by: 


Power = P(It*| > t(1 — a@/2;n — 2) | 5} n (2.26) 


where ô is the noncentrality measure—4.e., a measure of how far the true value of f, is from 


Pio: 


181 — Biol 
6 L-———— 2.27 
c (bi) mad 
Table B.5 presents the power of the two-sided t test.for о = .05 anda = .01, for various 
degrees of freedom df. To illustrate the use of this table, let us return to the Toluca Company 
example where we tested: 


Ho: Вз = Bio = 0 
Ha: Bi Bio = 0 


Suppose we wish to know the power of the test when 6, = 1.5. To ascertain this, we need 
to know o°, the variance of the error terms. Assume, based on prior information or pilot 
data, that a reasonable planning value for the unknown variance is o? = 2,500, so o?(b1) 
for our example would be: 


c? 2,500 


46} = -— = 263 
oth) = уу Xy = 19800 


or c {b1} = .3553. Then ô = |1.5 — 0| + .3553 = 4.22. We enter Table B.5 for a = .05 (the 
level of significance used in the test) and 23 degrees of freedom and interpolate linearly 
between ô = 4.00 and ô = 5.00. We obtain: 
i 
4.22 — 4.00 
OT + 500—100 (1.00 — .97) — .9766 

Thus, if B, — 1.5, the probability would be about .98 that we would be led to conclude 
Н, (Bı Æ 0). In other words, if B, = 1.5, we would be almost certain to conclude that there 
is a linear relation between work hours and lot size. 

The power of tests concerning Во canbe obtained from Table B.5 in completely analogous 
fashion. For one-sided tests, Table B.5 should be entered so that one-half the level of 
significance shown there is the level of significance of the one-sided test. 


52 PartOne Simple Linear Regression 


2.4 Interval Estimation of E{Y,} 


А common objective in regression analysis is to estimate the mean for one or more prob- 
ability distributions of Y. Consider, for example, a study of the relation between level of 
piecework pay (X) and worker productivity (Y). The mean productivity at high and medium 
levels of piecework pay may be of particular interest for purposes of analyzing the bene- 
fits obtained from an increase in the pay. As another example, the Toluca Company was 
interested in the mean response (mean number of work hours) for a range of lot sizes for 
purposes of finding the optimum lot size. 

Let X;, denote the level of X for which we wish to estimate the mean response. Xj, may 
be a value which occurred in the sample, or it may be some other value of the predictor 
variable within the scope of the model. The mean response when X = X, is denoted by 
E(Y;). Formula (1.12) gives us the point estimator Y, of E(Y,): 


Ӯ, = bo + bi Xp f (2.28) 


We consider now the sampling distribution of 7, . 


Sampling Distribution of f, 


The sampling distribution of Ӯ,, like the earlier sampling distributions discussed, refers to 
the different values of Ў, that would be obtained if repeated samples were selected, each 
holding the levels of the predictor variable X constant, and calculating Y; for each sample. 


For normal error regression model (2.1), the sampling distribution of 
Y, is normal, with mean and variance: 

E(f,) = E(Y,) (2.292) 

P 1 (x,-Xy 

?($;) =o? |-+ =. 

сҮ) =o nt 30;- 3X 


Normality. The normality of the sampling distribution of Ӯ, follows directly from the 
fact that Y;,, like bo and by, is a linear combination of the observations Y;. 


(2.29) 


(2.29b) 


Mean. Note from (2.29a) that Ў, is an unbiased estimator of E{Y,}. To prove this, we 
proceed as follows: 


E($,) = E(bo + bi Xn} = E(bo) + XnE {bi} = Bo + Bı Xa 
by (2.3a) and (2.222). 


Variance. Note from (2.29b) that the variability of the sampling distribution of Ӯ, is 
affected by how far X, is from X, through the term (X, — Х)2. The further from X is 
Xp, the greater is the quantity (X; — X)? and the larger is the variance of Ў,. An intuitive 
explanation of this effect is found in Figure 2.3. Shown there are two sample regression 
lines, based on two samples for the same set of X values. The two regression lines are 
assumed to go through the same (X, Y) point to isolate the effect of interest, namely, the 
effect of variation in the estimated slope Ру from sample to sample. Note that at X, near 
X, the fitted values Ў, for ће two sample regression lines аге close to each other. At X», 
which is far from X, the situation is different. Here, the fitted values Ў, differ substantially. 


FIGURE 2.3 
Effect on Y ,, of 
Variation in 5, 
from Sample to 
Sample in Two 
Samples with 
Same Means Y 
and X. 


Chapter 2 Inferences in Regression and Correlation Analysis 53 


Estimated Regression 
from Sample 1 


"XI 


Estimated Regression 
from Sample 2 


X X ХХ 


Thus, variation in the slope b, from sample to sample has a much more pronounced effect 
on Ê, for X levels far from the mean X than for X levels near X. Hence, the variation in the 
Ê, values from sample to sample will be greater when Х is far from the mean than when 
Х is near the mean. 

When MSE is substituted for o? ір (2.29b), we obtain s?{¥;,}, the estimated variance 
of Y,: 


(X, — XY 
У(Х; — Xy 


The estimated standard deviation of Ӯ, is then s{¥,}, the positive square root of s?(1,). 


s?(Y,) = MSE + (2.30) 


Comments 
1. When X, = 0, the variance of Ў, in (2.29b) reduces to the variance of bo in (2.22b). Similarly, 
s?{¥;,} in (2.30) reduces to s?{bo} in (2.23). The reason is that Ў, = bo when X, = 0 since Ў, = 
bo + b X4. 
2. To derive o?{¥;,}, we first show that b; and Y are uncorrelated and, hence, for regression model 
(2.1), independent: 
c(Y, b) «0 (2.31) 


where c (Y, b;) denotes the covariance between Y and Ру. We begin with the definitions: 


Y =>, (i)z Ы = 307 


where k; is as defined in (2.4а). We now use (A.32), with a; = 1/n and с; = k;; remember that the 
Y; are independent random variables: 


+ 


2 
of, b} = У) Ө ko? {Y} = T Ys 


But we know from (2.5) that > k; = 0. Hence, the covariance is 0. 
Now we are ready to find the variance of Ӯ,. We shall use the estimator in the alternative form (1.15): 


c?(f,) = o?^(tY +b (Xn — X)} 


54 PartOne Simple Linear Regression 


Since Ў and b, are independent and Х and X are constants, we obtain: 
c?(f,) = o° {F} + (X — XY'o?(bi) 


Now c? (bi) is given in (2.3b), and: 


c?(Y) = = — 
n n 
Hence: 
2:9 3 а? NE AV, c? 
[^ {Yn} = n + (Xr X) Ух; = Xy 
which, upon a slight rearrangement of terms, yields (2.29b). L| 
Sampling Distribution of (f, — E{Yn})/s{Vn} 2n 


Since we have encountered the ¢ distribution in each type of inference for regression 
model (2.1) up to this point, it should not be surprising that: 


E is distributed as z (n — 2) for regression model (2.1) (2.32) 
Sith 


Hence, all inferences concerning E {Y,,} are carried out in the usual fashion with the t 
distribution. We iliustrate the construction of confidence intervals, since in practice these 
are used more frequently than tests. 


Confidence Interval for Е{ Үр} 
A confidence interval for E {Y;,} is constructed in the standard fashion, making use of the t 
distribution as indicated by theorem (2.32). The 1 — a confidence limits are: 
Y, Xtt(1 —o/2; n — 2)5{Ў,} (2.33) 


Returning to the Toluca Company example, let us find a 90 percent confidence interval for 
E{Y;,} when the lot size is X; = 65 units. Using the earlier results in Table 2.1, we find the 
point estimate Ӯ,: 


Example 1 


Y, = 62.37 + 3.5702(65) = 294.4 
Next, we need to find the estimated standard deviation s {f,}. We obtain, using (2.30): 


1 (65 — 70.00? 


2[y — 2526 
a E + “79,800 


| = 98.37 


s(f,) = 9.918 
For a 90 percent confidence coefficient, we require t (.95; 23) = 1.714. Hence, our confi- 
dence interval with confidence coefficient .90 is by (2.33): 
294.4 — 1.714(9.918) « E(Y,) x 294.4 + 1.714(9.918) 
271.4 < E(Y,) < 311.4 
Weconclude with confidence coefficient .90 that the mean number of work hours required 


when lots of 65 units are produced is somewhere between 277.4 and 311.4 hours. We see 
that our estimate of the mean number of work hours is moderately precise. 


Example 2 


Chapter 2 Inferences in Regression and Correlation Analysis 55 


Suppose the Toluca Company wishes to estimate E (Y;) for lots with X; = 100 units with 
a 90 percent confidence interval. We require: 


Ê, = 62.37 + 3.5702(100) = 419.4 
К 1 . (100— 70.00)? 
2 = — ———— == 
s?(f,) = 2,384 | zh 15,800 | 203.72 
s(£,) = 14.27 


t(.95;23) = 1.714 


Hence, the 90 percent confidence interval is: 


419.4 — 1.714(14.27) < E(Y,) < 419.4 + 1.714(14.27) 
394.9 < E(Y,) < 443.9 3 


Note that this confidence interval is somewhat wider than that for Example 1, since the 


X, level here (X, = 100) is substantially farther from the mean X = 70.0 than the X, 
level for Example 1 (X, = 65). 


Comments 


1. 


Since the X; are known constants in regression model (2.1), the interpretation of confidence 
intervals and risks of errors in inferences on the mean response is in terms of taking repeated 
samples in which the X observations are at the same levels as in the actual study. We noted this 
same point in connection with inferences on Во and f. 


. We see from formula (2.29b) that, for given sample results, the variance of Ў, is smallest when 


Xn = X. Thus, in an experiment to estimate the mean response at a particular level X} of the 
predictor variable, the precision of the estimate will be greatest if (everything else remaining equal) 
the observations on X are spaced so that X = X}. 

The usual relationship between confidence intervals and tests applies in inferences concerning the 
mean response. Thus, the two-sided confidence limits (2.33) can be utilized for two-sided tests 
concerning the mean response at Ху. Alternatively, a.regular decision rule can be set up. 

The confidence limits (2.33) for a mean response E (Y; ) are not sensitive to moderate departures 
from the assumption that the error terms are normally distributed. Indeed, the limits are not sensitive 
to substantial departures from normality if the sample size is large. This robustness in estimating 
the mean response is related to the robustness of the confidence limits for Во and 61, noted earlier. 
Confidence limits (2.33) apply when a single mean response is to be estimated from the study. We 
discuss in Chapter 4 how to proceed when several mean responses are to be estimated from the 
same data. " 


1 


2.5 Prediction of New Observation 


We consider now the prediction of a new observation Y corresponding to a given level X of 
the predictor variable. Three illustrations where prediction of*à new observation is needed 
follow. 


1. In the Toluca Company example, the next lot to be produced consists of 100 units and 


management wishes to predict the number of work hours for this particular lot. 


56 PartOne Simple Linear Regression 


2. An economist has estimated the regression relation between company sales and number 
of persons 16 or more years old from data for the past 10 years. Using a reliable de- 
mographic projection of the number of persons 16 or more years old for next year, the 
economist wishes to predict next year's company sales. 

3. An admissions officer at a university has estimated the regression relation between 
the high school grade point average (GPA) of admitted students and the first-year college 
GPA. The officer wishes to predict the first-year college GPA for an applicant whose 
high school GPA is 3.5 as part of the information on which an admissions decision will 
be based. 


The new observation on Y to be predicted is viewed as the result of a new trial, inde- 
pendent of the trials on which the regression analysis is based. We denote the leyel of X 
for the new trial as Хһ and the new observation on Y as Уьч). Of course, We assume 
that the underlying regression model applicable for the basic sample data continues to be 
appropriate for the new observation. 

The distinction between estimation of the mean response E{Y;}, discussed in the pre- 
ceding section, and prediction of a new response Yjnew), discussed now, is basic. In the 
former case, we estimate the mean of the distribution of Y. In the present case, we predict 
an individual outcome drawn from the distribution of Y. Of course, the great majority of 
individual outcomes deviate from the mean response, and this must be taken into account 
by the procedure for predicting Ys... 


Prediction Interval for Уге) when Parameters Known 


To illustrate the nature of a prediction interval for а new observation Урем) in as simple a 
fashion as possible, we shall first assume that all regression parameters are known. Later 
we drop this assumption and make appropriate modifications. 

Suppose that in the college admissions example the relevant parameters of the regression 
model are known to be: 


Bo — .10 Ві = 95 
E{Y} = .10+ .95X 
о = .12 


The admissions officer is considering an applicant whose high school GPA is X; = 3.5. 
The mean college GPA for students whose high school average is 3.5 is: 


E(Y,) = .10 + .95(3.5) = 3.425 


Figure 2.4 shows the probability distribution of Y for X, = 3.5. Its meanis E{Y;,} = 3.425, 
and its standard deviation is с = .12. Further, the distribution is normal in accord with 
regression model (2.1). 

Suppose we were to predict that the college GPA of the applicant whose high school 
GPA is X, = 3.5 will be between: 


E(Y,) + Зо 
3.425 + 3(.12) 
so that the prediction interval would be: 
3.065 < Умму < 3.785 


FIGURE 2.4 
Prediction of 
тем) When 
Parameters 
Known. 


Chapter 2 Inferences in Regression and Correlation Analysis 57 


Prediction Limits 


3.425 — 30 E{Y,} = 3.425 3.425+30 Y 
Probability Distribution of Y when X, = 3.5 }. 


Since 99.7 percent of ће area іп a normal probability distribution falls within three standard 
deviations from the mean, the probability is .997 that this prediction interval will give a 
correct prediction for the applicant with high school GPA of 3.5. While the prediction limits 
here are rather wide, so that the prediction is not too precise, the prediction interval does 
indicate to the admissions officer that the applicant is expected to attain at least a 3.0 GPA 
in the first year of college. 

The basic idea of a prediction interval is thus to choose a range in the distribution of Y 
wherein most of the observations will fall, and then to declare that the next observation will 
fall in this range. The usefulness of the prediction interval depends, as always, on the width 
of the interval and the needs for precision by the user. 

In general, when the regression parameters of normal error regression model (2.1) are 
known, the 1 — о prediction limits for Улем) are: 


E(Y,) # z(1 — 0/2)с (2.34) 


In centering the limits around E{Y;,}, we obtain the narrowest interval consistent with the 
specified probability of a correct prediction. 


Prediction Interval for Үпем) when Parameters Unknown 


When the regression parameters are unknown, they must be estimated. The mean of the 
distribution of Y is estimated by Ў,, as usual, апа the variance of the distribution of Y 
is estimated by MSE. We cannot, however, simply use the prediction limits (2.34) with 
the parameters replaced by the corresponding point estimators. The reason is illustrated 
intuitively in Figure 2.5. Shown there are two probability distributions of У, corresponding to 
the upper and lower limits of a confidence interval for E(Y; }. In other words, the distribution 
of Y could be located as far left as the one shown, as far right as the other one shown, or 
anywhere in between. Since we do not know the mean E(Y,) and only estimate it by a 
confidence interval, we cannot be certain of the location of the distribution of Y. 

Figure 2.5 also shows the prediction limits for each of the two probability distribu- 
tions of Y presented there. Since we cannot be certain of the location of the distribution 


58 PartOne Simple Linear Regression 


FIGURE 2.5 
Prediction of 
Y (пем) When 
Parameters 
Unknown. 


Prediction 
Limits 
if ҢҮ} Here 


| 


Prediction 
Limits 
if E{Y,} Here 


| 


Y, 


14— — Confidence Limits for КҮ} — —91 


of Y, prediction limits for Ypmew) clearly must take account of two elements, as shown in 
Figure 2.5: 


1. Variation in possible location of the distribution of У. 
2. Variation within the probability distribution of Y. 


Prediction limits for a new observation Ypmew at a given level X; are obtained by means 
of the following theorem: 


Y, new) ^. M A А E а 
oe is distributed as t(n — 2) for normal error regression model (2.1) (2.35) 
51р 


Note that the studentized statistic (2.35) uses ће point estimator Ў, in the numerator rather 
than the true mean E{ У, } because the true mean is unknown and cannot be used in making а 
prediction. The estimated standard deviation of the prediction, s(pred], in the denominator 
of the studentized statistic will be defined shortly. 

From theorem (2.35), it follows in the usual fashion that the 1 — o prediction limits for 
a new observation Yi(new) are (for instance, compare (2.35) to (2.10) and relate Ê, to bı and 
Үсек) to B1): 


Ў, +1(1 — 0/2; n — 2)s{pred} (2.36) 


Note that the numerator of the studentized statistic (2.35) represents how far the new 
observation Ypmew) will deviate from the estimated mean Ӯ, based on the original п cases in 
the study. This difference may be viewed as the prediction error, with Y, serving as the best 
point estimate of the value of the new observation Yj, (new). The variance of this prediction 
error can be readily obtained by utilizing the independence of the new observation Y;(,., and 
the original n sample cases on which Ӯ, is based. We denote the variance of the prediction 
error by c? (pred), and we obtain by (A.31b): 


c^ {pred} = оос — Pn} = o^ Xue] + © {ЁЎ„} = 07 но) (2.37) 
Note that c?(pred) has two components: 


1. The variance of the distribution of Y at X = Xp, namely c?. 
2. The variance of the sampling distribution of Y,, namely c?(Y, ). 


Баре _ 


Chapter2 Inferences in Regression and Correlation Analysis 59 


An unbiased estimator of o?(pred) is: 
s?(pred) = MSE + s?{¥,,} (2.38) 


which can be expressed as follows, using (2.30): 


_ у\? 
I CU 99) | (2.38a) 


2 — E Йыл A 
s? {pred} = MSE [ ке YO Xy 


The Toluca Company studied the relationship between lot size and work hours primarily 
to obtain information on the mean work hours required for different lot sizes for use in 
determining the optimum lot size. The company was also interested, however, to see whether 
the regression relationship is useful for predicting the required work hours for individual 
lots. Suppose that the next lot to be produced consists of X; = 100 units and that a 90 percent 
prediction interval is desired. We require t(.95; 23) = 1.714. From earlier work, we have: 


Ў = 419.4  s(1, = 203.72 MSE = 2,384 
Using (2.38), we obtain: 


s? (pred) = 2,384 + 203.72 = 2,587.72 
s{pred} = 50.87 


Hence, the 90 percent prediction interval for Ypmew) is by (2.36): 


419.4 — 1.714(50.87) < Уе) < 419.4 + 1.714(50.87) 
332.2 € Үе») < 506.6 


With confidence coefficient .90, we predict that the number of work hours for the next 
production run of 100 units will be somewhere between 332 and 507 hours. 

This prédiction interval is rather wide and may not be too useful for planning worker 
requirements for the next lot. The interval can still be useful for control purposes, though. 
For instance, suppose that the actual work hours on the next lot of 100 units were 550 hours. 
Since the actual work hours fall outside the prediction limits, management would have an 
indication that a change in the production process may have occurred and would be alerted 
to the possible need for remedial action. : 

Note that the primary reason for the wide prediction interval is the large lot-to-lot vari- 
ability in work hours for any given lot size; MSE — 2,384 accounts for 92 percent of 
the estimated prediction variance s?(pred) = 2,587.72. It may be that the large lot-to-lot 
variability reflects other factors that affect the required number of work hours besides lot 
size, such as the amount of experience of employees assigned to the lot production. If so, а 
multiple regression model incorporating these other factors might lead to much more pre- 
cise predictions. Alternatively, a designed experiment could be conducted to determine the 
main factors leading to the large lot-to-lot variation. А quality improvement program would 
then use these findings to achieve more uniform performance, for example, by additional 
training of employees if inadequate training accounted for much of the variability. 


60 PartOne Simple Linear Regression 


Comments 

1. The 90 percent prediction interval for Y;new obtained in the Toluca Company example is wider 
than the 90 percent confidence interval for E(Y,) obtained in Example 2 on page 55. The reason is 
that when predicting the work hours required for a new lot, we encounter both the variability in Ў, 
from sample to sample as well as the lot-to-lot variation within the probability distribution of У. 

2. Formula (2.382) indicates that the prediction interval is wider the further Ху, is from X. The 
reason for this is that the estimate of the mean $8. as noted earlier, is less precise as X; is located 
farther away from X. 

3. The prediction limits (2.36), unlike the confidence limits (2.33) for a mean response E(Y;.), 
are sensitive to departures from normality of the error terms distribution. In Chapter 3, we discuss 
diagnostic procedures for examining the nature of the probability distribution of the error terms, and 
we describe remedial measures if the departure from normality is serious. 7 

4, The confidence coefficient for the prediction limits (2.36) refers to the taking of repeated 
samples based on the same set of X values, and calculating prediction limits for Ypmew) for each 
sample. 

5. Prediction limits (2.36) apply for a single prediction based on the sample data. Next, we discuss 
how to predict the mean of several new observations at a given X}, and in Chapter 4 we take up how 
to make several predictions at different X; levels. 

6. Prediction intervals resemble confidence intervals. However, they differ conceptually. A confi- 
dence interval represents an inference on a parameter and is an interval that is intended to cover the 
value of the parameter. A prediction interval, on the other hand, is a statement about the value to be 
taken by a random variable, the new observation Yinew)- n" 


Prediction of Mean of m New Observations for Given X; 

Occasionally, one would like to predict the mean of т new observations on Y for a given 
level of the predictor variable. Suppose the Toluca Company has been asked to bid on a 
contract that calls for т = 3 production runs of X, = 100 units during the next few months. 
Management would like to predict the mean work hours per lot for these three runs and 
then convert this into a prediction of the total work hours required to fill the contract. 

We denote the mean of the new Y observations to be predicted as Y h(new). It can be shown 
that the appropriate 1 — o prediction limits are, assuming that the new Y observations are 


independent: 
Y, X t(1 —o/2; n — 2)s{predmean} (2.39) 
where: 
s*{predmean} = E +s°{Î,} (2.39a) 
or equivalently: 


_ у\? 
ЕЕ) | (2.39b) 


1 
2 
s^(predmean) = MSE E + " + УХ — Xy 


Note from (2.392) that the variance s?(predmean) has two components: 


1. The variance of the mean of m observations from the probability distribution of Y at 
X = X, 
2. The variance of the sampling distribution of Ӯ». 


Example 


Chapter 2 Inferences in Regression and Correlation Analysis 61 


In the Toluca Company example, let us find the 90 percent prediction interval for the mean 
number of work hours Y, (new) In three new production runs, each for Хһ = 100 units. From 
previous work, we have: 


— 419.4 s?($,) = 203.72 
MSE = 2,384 1(.95;23) = 1.714 


Hence, we obtain: 
,384 


52{ргейтеап} = 
s{predmean} = 31.60 
The prediction interval for the mean work hours per lot then is: 
419.4 — 1.714(31.60) < Y, < 419.4 + 1.714(31.60) 
365.2 < һем) < 473.6 


Note that these prediction limits are narrower than those for predicting the work hours 
for a single lot of 100 units because they involve a prediction of the mean work hours for 
three lots. 

We obtain the prediction interval for the total number of work hours for the three lots by 
multiplying the prediction limits for Y, by 3: . 


1,095.6 = 3(365.2) < Total work hours < 3(473.6) = 1,420.8 


Thus, it can be predicted with 90 percent confidence that between 1,096 and 1,421 work 
hours will be needed to fill the contract for three lots of 100 units each. 


b 


Comment 


The 90 percent prediction interval for Ў, еъ); obtained for the Toluca Company example above, is 
narrower than that obtained for Уһ пеъ) on page 59, as expected. Furthermore, both of the prediction in- 
tervals are wider than the 90 percent confidence interval for Е {Y;,} obtained in Example 2 on page 55— 
also as expected. a 


2.6 Confidence-Band for Regression Line 


At times we would like to obtain a confidence band for the entire regression line E{Y} = 
Bo + В.Х. This band enables us to see the region in which the entire regression line lies. It 
is particularly useful for determining the appropriateness of a fitted regression function, а as 
we explain in Chapter 3. 

The Working-Hotelling 1 — o confidence band for the regression line for regression 
model (2.1) has the following two boundary values at any level X;,: 


Y, + Ws(Y,) (2.40) 

where: 
W? =2F(1—a;2,n—2) - (2.40a) 
and Ў, and s{¥;,} are defined in (2.28) and (2.30), respectively. Note that the formula 


for the boundary values is of exactly the same form as formula (2.33) for the confidence 
limits for the mean response at X;,, except that the т multiple has been replaced by the W 


62 PartOne Simple Linear Regression 


Example 


FIGURE 2.6 
Confidence 
Band for 
Regression 
Line—Toluca 
Company 
Example. 


multiple. Consequently, the boundary points of the confidence band for the regression line 
are wider apart the further Х is from the mean X of the X observations. The W multiple 
will be larger than the т multiple in (2.33) because the confidence band must encompass 
the entire regression line, whereas the confidence limits for E (Y,) at X, apply only at the 
single level X}. 


We wish to determine how precisely we have been able to estimate the regression function 
for the Toluca Company example by obtaining the 90 percent confidence band for the 
regression line. We illustrate the calculations of the boundary values of the confidence band 
when X;, = 100. We found earlier for this case: 


$,—4194 5{0,}= 14.27 


m d 


We now require: 


W? = 2F (1 —a; 2, n —2) = 2F(.90; 2, 23) = 2(2.549) = 5.098 
W — 2.258 


Hence, the boundary values of the confidence band for the regression line at X, — 100 are 
419.4 + 2.258(14.27), and the confidence band there is: 


387.2 < Bo + By X, < 451.6 for X, — 100 


In similar fashion, we can calculate the boundary values for other values of X; by 
obtaining Ӯ, and s(7,) for each X; level from (2.28) and (2.30) and then finding the 
boundary values by means of (2.40). Figure 2.6 contains a plot of the confidence band for 
the regression line. Note that at X, — 100, the boundary values are 387.2 and 451.6, as we 
calculated earlier. 

We see from Figure 2.6 that the regression line for the Toluca Company example has 
been estimated fairly precisely. The slope of the regression line is clearly positive, and the 
levels of the regression line at different levels of X are estimated fairly precisely except for 
small and large lot sizes. 


500 
450 
400 
350 
300 
250 
200 
150 
100 


Hours Y 


50 
20 30 40 50 60 70 80 90 100 110 
Lot Size X 


Chapter 2 Inferences in Regression and Correlation Analysis 63 


Comments 


1. The boundary values of the confidence band for the regression line in (2.40) define a hyperbola, 
as тау be seen by replacing Y, and s(Y;) by their definitions in (2.28) and (2.30), respectively: 


1 a-e |” 
by + bı X + WA/MSE E d | (2.41) 


n УХХ; = Xy 


2. The boundary values of the confidence band for the regression line at any value X; often are 
not substantially wider than the confidence limits for the mean response at that single X; level. In 
the Toluca Company example, the t multiple for estimating the mean response at X, = 100 with a 
90 percent confidence interval was £(.95; 23) — 1.714. This compares with the W multiple for the 
90 percent confidence band for the entire regression line of W — 2.258. With the somewhat wider 
limits for the entire regression line, one is able to draw conclusions about any and all mean responses 
for the entire regression line and not just about the mean response at a given X level. Some uses of 
this broader base for inference will be explained in the next two chapters. n 

3. The confidence band (2.40) applies to the entire regression line over all real-numbered values 
of X from —oo to оо. The confidence coefficient indicates the proportion of time that the estimating 
procedure will yield a band that covers the entire line, in a long series of samples in which the X 
observations are kept at the same level as in the actual study. 

In applications, the confidence band is ignored for that part of the regression line which is not 
of interest in the problem at hand. In the Toluca Company example, for instance, negative lot sizes 
would be ignored. The confidence coefficient for a limited segment of the band of interest is somewhat 
higher than 1 — о, so 1 — « serves then as a lower bound to the confidence coefficient. 

4. Some alternative procedures for developing confidence bands for the regression line have been 
developed. The simplicity of the Working-Hotelling confidence band (2.40) arises from the fact that 
it is a direct extension of the confidence limits for a single mean response in (2.33). | 


2.7 Analysis of Variance Approach to Regression Analysis 


We now have developed the basic regression model and demonstrated its major uses. At 
this point, we consider the regression analysis from the perspective of analysis of variance. 
This new perspective will not enable us to do anything new, but the analysis of variance 
approach will come into its own when we take up multiple regression models and other 
types of linear statistical models. 


Partitioning of Total Sum of Squares 

Basic Notions. The analysis of variance approach is based on the partitioning of sums 
of squares and degrees of freedom associated with the response variable Y. To explain the 
motivation of this approach, consider again the Toluca Company example. Figure 2.7a shows 
the observations Y; for the first two production runs presented in Table 1.1. Disregarding 
the lot sizes, we see that there is variation in the number of work hours Y;, as in all statistical 
data. This variation is conventionally measured in terms of the.deviations of the Y; around 
their mean Y: 


yy (2.42) 


64 PartOne Simple Linear Regression 


FIGURE 2.7 Illustration of Partitioning of Total Deviations Y; — Y —Toluca Company Example (not drawn to 
scale; only observations Y, and Y, are shown). 


Hours 


(a) (b) (с) 


Total Deviations Y, — Y Deviations Y, — Y, Deviations Y, — Y 


Lot Size Lot Size Lot Size 


These deviations are shown by the vertical lines in Figure 2.7a. The measure of total 
variation, denoted by SSTO, is the sum of the squared deviations (2.42): 


5570 = M Y; – Ӯ)? (2.43) 


Неге SSTO stands for total sum of squares. If all Y; observations are the same, SSTO = 0. 
The greater the variation among the Y; observations, the larger is SSTO. Thus, SSTO for 
our example is a measure of the uncertainty pertaining to the work hours required for a lot, 
when the lot size is not taken into account. 

When we utilize the predictor variable X, the variation reflecting the uncertainty con- 
cerning the variable Y is that of the Y; observations around the fitted regression line: 

pcd (2.44) 

These deviations are shown by the vertical lines in Figure 2.7b. The measure of variation 
in the Y; observations that is present when the predictor variable X is taken into account is 
the sum of the squared deviations (2.44), which is the familiar SSE of (1.21): 


SSE = SoG ey (2.45) 


Again, SSE denotes error sum of squares. ЇЁ all Y; observations fall on the fitted regression 
line, SSE — 0. The greater the variation of the Y; observations around the fitted regression 
line, the larger is SSE. 

For the Toluca Company example, we know from earlier work (Table 2.1) that: 


SSTO — 307,203 SSE — 54,825 


What accounts for the substantial difference between these two sums of squares? The 
difference, as we show shortly, is another sum of squares: 


SSR = У (0, – Ë? (2.46) 


Chapter2  Inferences in Regression and Correlation Analysis 65 


where SSR stands for regression sum of squares. Note that SSR is a sum of squared deviations, 
the deviations being: 


¥,-¥ (2.47) 


These deviations are shown by the vertical lines in Figure 2.7c. Each deviation is simply the 
difference between the fitted value on the regression line and the mean of the fitted values 
Y. (Recall from (1.18) that the mean of the fitted values Ў, is Y.) If the regression line is 
horizontal so that f; — Y = 0, then SSR = 0. Otherwise, SSR is positive. 

SSR may be considered a measure of that part of the variability of the Y; which is 
associated with the regression line. The larger SSR is in relation to SSTO, the greater is the 
effect of the regression relation in accounting for the total variation in the Y; observations. 

For the Toluca Company example, we have: 

SSR — SSTO — SSE — 307,203 — 54,825 — 252,378 у 
which indicates that most of the total variability in work hours is accounted for by the 
relation between lot size and work hours. 


Formal Development of Partitioning. The total deviation Y; — Y , used in the measure of 
the total variation of the observations Y; without taking the predictor variable into account, 
can be decomposed into two components: 


y-Y-fhpqoo (2.48) 
—— — o —— 
Total Deviation Deviation 
deviation of fitted around 
regression fitted 
value regression 
around mean line 


The two components are: 


1. The deviation of the fitted value Ӯ, around the mean F. 
2. The deviation of the observation Y; around the fitted regression line. 


Figure 23 shows this decomposition for observation Ү by the broken lines. 
It is a remarkable property that the sums of these squared deviations have the same 
relationship: 


Уи - Р? = У - РР + Уи?) (2.49) 
or, using the notation in (2.43), (2.45), and (2.46): 
-  SSTO = SSR + SSE (2.50) 


To prove this basic result in the analysis of variance, we proceed as follows: 
УР) = SIG -P+ Y – HP 
= Id; -PP + q; — £y? +20; — NM — £71 
= У -YY-Y Р) +2500 -rf o - 0) 


66 Part Опе Simple Linear Regression 


The last term on the right equals zero, as we can see by expanding it: 
2X d; - РК -£) 229 ^£; - $) - 2Y у (И 0) 


The first summation on the right equals zero by (1.20), and the second equals zero by (1.17). 
Hence, (2.49) follows. 


Comment 

The formulas for SSTO, SSR, and SSE given in (2.43), (2.45), and (2.46) are best for computational 
accuracy. Alternative formulas that are algebraically equivalent are available. One that is useful for 
deriving analytical results is: 


SSR = b? Ух, — Xy ad (2.51) 
= 


Breakdown of Degrees of Freedom 
Corresponding to the partitioning of the total sum of squares SSTO, there is a partitioning 
of the associated degrees of freedom (abbreviated df). We have n — 1 degrees of freedom 
associated with SSTO. One degree of freedom is lost because the deviations Y; — Y are 
subject to one constraint: they must sum to zero. Equivalently, one degree of freedom is 
lost because the sample mean Y is used to estimate the population mean. 

SSE, as noted earlier, has n — 2 degrees of freedom associated with it. Two degrees of 
freedom are lost because the two parameters Во and f, are estimated in obtaining the fitted 
values Y;. 

SSR has one degree of freedom associated with it. Although there are п deviations Ў; — F, 
all fitted values Ӯ, are calculated from the same estimated regression line. Two degrees of 
freedom are associated with a regression line, corresponding to the intercept and the slope 
of the line. One of the two degrees of freedom is lost because the deviations Ӯ, — Y are 
subject to a constraint: they must sum to zero. 

Note that the degrees of freedom are additive: 


п-1=1+ (0 – 2) 
For фе Toluca Company example, these degrees of freedom аге: 


Mean Squares 


A sum of squares divided by its associated degrees of freedom is called a mean square 
(abbreviated MS). For instance, an ordinary sample variance is a mean square since a sum 
of squares,  '(Y; — Y), is divided by its associated degrees of freedom, n — 1. We are 
interested here in the regression mean square, denoted by MSR: 


SSR 
MSR — KE SSR (2.52) 
and in the error mean square, MSE, defined earlier in (1.22): 


SSE 


MSE — 
n—2 


(2.53) 


Chapter 2 Inferences in Regression and Correlation Analysis 67 


For the Toluca Company example, we have SSR — 252,378 and SSE — 54,825. Hence: 


252,378 
MSR = 1 — 252,378 
Also, we obtained earlier: 
54,825 
MSE = — = 2,384 
23 
Comment 
The two mean squares MSR and MSE do not add to 
SSTO _ 307,203 — 12,800 
(n 10) 24 
Thus, mean squares are not additive. N " 


Analysis of Variance Table 


TABLE 2.2 
ANOVA Table 
for Simple 
Linear 
Regression. 


Basic Table. The breakdowns of the total sum of squares and associated degrees of 
freedom are displayed in the form of an analysis of variance table (ANOVA table) in 
Table 2.2. Mean squares of interest also are shown. In addition, the ANOVA table contains 
a column of expected mean squares that will be utilized shortly. The ANOVA table for the 
Toluca Company example is shown in Figure 2.2. The columns for degrees of freedom and 
sums of squares are reversed in the MINITAB output. 


Modified Table. Sometimes an ANOVA table showing one additional element of decom- 
position is utilized. This modified table is based on the fact that the total sum of squares 
can be decomposed into two parts, as follows: 


SSTO = YQ; - YY = У Y? - n? 
In the modified ANOVA table, the total uncorrected sum of squares, denoted by SSTOU, 
is defined as: 

SSTOU =`у Y? (2.54) 
and the correction for the mean sum of squares, denoted by SS(correction for mean), is 
defined as: | 

SS(correction for mean) = nY? (2.55) 
Table 2.3 shows the general format of this modified ANOVA table. While both types of 
ANOVA tables are widely used, we shall usually utilize the basic type of table. 


і 


Source of - x | 
Variation SS df MS E(MS) 
Regression  SSR-PXf,-YY. 1 spa а? BEY Gy 
Error SSE = 3 XY; — £y n—2. # 


Total SSTO-YXY-YY | n-1 


68 Part Опе Simple Linear Regression 


TABLE 2.3 
Modified 
ANOVA Table 
for Simple 
Linear 
Regression. 


Source of А 
Variation SS df MS 

| 5 SSR. 
Regression SSR = » (f; — Y) 1 MSR = T 

a E 
Error SSE = У (Y; — Ny n—2 MSE — SN. 
Total SSTO = YXY, Ӯ)2 n—1 | 
Correction for mean SS(correction 1 
for mean) = nY? 

Total, uncorrected SSTOU = Y7 n uh 


Expected Mean Squares 


In order to make inferences based on the analysis of variance approach, we need to know 
the expected value of each of the mean squares. The expected value of a mean square is the 
mean of its sampling distribution and tells us what is being estimated by the mean square. 
Statistical theory provides the following results: 


E{MSE} = о? 

E(MSR) = о? + 8? Ух; - Xy 
The expected mean squares in (2.56) and (2.57) are shown in the analysis of variance table 
in Table 2.2. Note that result (2.56) is in accord with our earlier statement that MSE is an 
unbiased estimator of c?. 


Two important implications of the expected mean squares in (2.56) and (2.57) are the 
following: 


(2.56) 
(2.57) 


1. The mean of the sampling distribution of MSE is o? whether or not X and Y are linearly 
related, 1.e., whether or not B, = 0. 

2. The mean of the sampling distribution of MSR is also c? when f, = 0. Hence, when 
Pi = 0, the sampling distributions of MSR and MSE are located identically and MSR and 
MSE will tend to be of the same order of magnitude. 

On the other hand, when В 40, the mean of the sampling distribution of MSR is 
greater than c? since the term 82 У(Х; — X)? in (2.57) then must be positive. Thus, 
when f; 5 0, the mean of the sampling distribution of MSR is located to the right of that 
of MSE and, hence, MSR will tend to be larger than MSE. 


This suggests that a comparison of MSR and MSE is useful for testing whether or not 
Bı = 0. If MSR and MSE are of the same order of magnitude, this would suggest that Ву = 0. 
On the other hand, if МР is substantially greater than MSE, this would suggest that В) 40. 
This indeed 15 the basic idea underlying the analysis of variance test to be discussed next. 


Comment 


The derivation of (2.56) follows from theorem (2.11), which states that SSE/o? ~ x?(n — 2) 
for regression model (2.1). Hence, it follows from property (A.42) of the chi-square distribution 


Chapter 2 Inferences in Regression and Correlation Analysis 69 


that: 
жў 
с? 
or that: 


SSE 
3; = E(MSE) = о? 
п – 2 


То find the expected value of MSR, we begin with (2.51): 
SSR = b? SOG - Xy 
Now by (A.15a), we have: 
c?^(bi) = E(bt) — (Etb)? (2.58) 
We know from (2.3a) that E(b,) = f£, and from (2.36) that: 
2 b 
^ c 
bi) = = 
о {by} 3n — Xy 
Hence, substituting into (2.58), we obtain: 


о? 
EU = ya уу th 


It now follows that: 
E{SSR} = E{bi} Y Xi- Ху = o? + BEY 0, — Ху 
Finally, E(MSR] is: 
E(MSR) — DE =07 + Y (x; - Xy 
п 
F Test of В =0 versus f, 40 


The analysis of variance approach provides us with a battery of highly useful tests for 
regression models (and other linear statistical models). For the simple linear regression 
case considered here, the analysis of variance provides us with a test for: 

Ho: В. = 0 

Ha: Ву #0 
Test Statistic. Тһе test statistic for the analysis of variance approach is denoted Бу F*. 
As just mentioned, it compares MSR and MSE in the following fashion: 

+ MSR 
F* = —_ 2.60 
- MSE (2.60) 


The earlier motivation, based on the expected mean squares in Table 2.2, suggests that large 
values of F* support H, and values of F* near 1 support Ho. In other words, the appropriate 
test is an upper-tail one. Ў 


(2.59) 


Sampling Distribution of F*. In order to be able to construct a statistical decision rule 
and examine its properties, we need to know the sampling distribution of F*. We begin by 
considering the sampling distribution of F* when Но (В, = 0) holds. Cochran's theorem 


70 Part One ‘Simple Linear Regression 


will be most helpful in this connection. For our purposes, this theorem can be stated as 
follows: 


If all п observations Y; come from the same normal distribution with 
mean и and variance c?, and SSTO is decomposed into k sums of 
squares SS,, each with degrees of freedom df,, then the SS,./o? terms 
are independent x? variables with df, degrees of freedom if: 


(2.61) 


k 
Sod: =п—1 
r=} P 


d 
Note from Table 2.2 that we have decomposed SSTO into the two sums of squares SSR 
and SSE and that their degrees of freedom are additive. Hence: 


If B, = 0 so that all Y; have the same mean u = Во and the same 
variance o?, SSE/o? and SSR/a? are independent x? variables. 


Now consider test statistic F*, which we can write as follows: 


SSR SSE 


But by Cochran's theorem, we have when Ho holds: 


‚ X0 _ 2-2) 


F 
1 n—2 


when Ho holds 


where the x? variables are independent. Thus, when Но holds, F* is the ratio of two 
independent x? variables, each divided by its degrees of freedom. But this is the definition 
of an F random variable in (A.47). 

We have thus established that if Hp holds, F* follows the F distribution, specifically the 
F(1, n — 2) distribution. 

When H, holds, it can be shown that F* follows the noncentral F distribution, a complex 
distribution that we need not consider further at this time. 


Comment 
Even if f, Æ 0, SSR and SSE are independent and SSE/o? ~ x?. However, the condition that both 
SSR/o? and SSE/o? are x? random variables requires £; = 0. [| 


Construction of Decision Rule. Since the test is upper-tail and F* is distributed as 
F(1,n — 2) when Ho holds, the decision rule is as follows when the risk of a Type I error 
is to be controlled at о: 

КЕ < F(1—o;1,n —2), conclude Но 


ЕЕ > F(1—0o;1,n —2),conclude H, (2:62) 


where F(1 — a; 1, n — 2) is the (1 — 0)100 percentile of the appropriate F distribution. 


Ехагпр!е 


Chapter 2 Inferences in Regression and Correlation Analysis 71 


For the Toluca Company example, we shall repeat the earlier test on 6, this time using the 
F test. The alternative conclusions are: 
Но: В, = 0 
На: Ву #0 
As before, let o; = .05. Since п = 25, we require F(.95; 1, 23) = 4.28. The decision rule is: 
If F* < 4.28, conclude Ho 
If F* > 4.28, conclude H, 


We have from earlier that MSR = 252,378 and MSE = 2,384. Hence, F* is: 
s _ 252,378 
— 2,384 
Since F* = 105.9 > 4.28, we conclude H,, that В, 5 0, or that there4is a linear 
association between work hours and lot size. This is the same result as when the f test was 
employed, as it must be according to our discussion below. 
The MINITAB output in Figure 2.2 on page 46 shows the F* statistic in the column 
labeled F. Next to it is shown the P-value, P(F(1, 23) > 105.9}, namely, 0+, indicating 
that the data are not consistent with В = 0. 


— 105.9 


Equivalence of F Test and ź Test. Fora given o level, the F test of Ву = 0 versus B; 4 0 
is equivalent algebraically to the two-tailed ż test. To see this, recall from (2.51) that: 


SSR = bt (X; – Xy, 


Thus, we can write: 


pee Ril __ b? (X; - XY 
SSE + (n — 2) MSE 
But since s?(bi) = MSE/ Y (X; — X)*, we obtain: 
C 
F* = ——1— = = (ry 2.63 
5461) sibi) a an 
The last step follows because the ¢* statistic for testing whether or not В = 0 is by (2.17): 
b 
fte 
s(bij 


In the Toluca Company example, we just calculated that F* = 105.9. From earlier work, 
we have t* — 10.29 (see Figure 2.2). We thus see that (10.29)? — 105.9. 

Corresponding to the relation between г* and F*, we have the following relation between 
the required percentiles of the t and F distributions for the tests: [¢(1 — &/2; n — 2? = 
F(1 — o; 1, n — 2). In our tests on f, these percentiles were [1(.975; 23)] = (2.069)? — 
4.28 = F(.95; 1, 23). Remember that the г test is two-tailed whereas the F test is one-tailed. 

Thus, at any given о level, we can use either the ¢ test or the F test for testing B, = 0 
versus В, 0. Whenever one test leads to Ho, so will the other, and correspondingly for H,. 
The t test, however, is more flexible since it can be used for one-sided alternatives involving 
Bi (< >) 0 versus Bj (> <) 0, while the F test cannot. 


72 PartOne Simple Linear Regression 


2.8 General Linear Test Approach 


The analysis of variance test of В, = 0 versus В Æ 0 is an example of the general test for 
a linear statistical model. We now explain this general test approach in terms of the simple 
linear regression model. We do so at this time because of the generality of the approach 
and the wide use we shall make of it, and because of the simplicity of understanding the 
approach in terms of simple linear regression. 

The general linear test approach involves three basic steps, which we now describe in 
turn. 


Full Model m d 


We begin with the inodel considered to be appropriate for the data, which in this context is 
called the full or unrestricted model. For the simple linear regression case, the full model is 
the normal error regression model (2.1): 


Y; = fo + Bi Xi + &; Full model (2.64) 


We fit this full model, either by the method of least squares or by the method of maximum 
likelihood, and obtain the error sum of squares. The error sum of squares is the sum of the 
squared deviations of each observation Y; around its estimated expected value. In this 
context, we shall denote this sum of squares by SSE(F) to indicate that it is the error sum 
of squares for the full model. Here, we have: 


SSE(F) = У IY, — (bo + b XDP = У (и - 0) = SSE (2.65) 


Thus, for the full model (2.64), the error sum of squares is simply SSE, which measures the 
variability of the Y; observations around the fitted regression line. 


Reduced Model 


Next, we consider Но. In this instance, we have: 


Ho: В, = 0 
ue (2.66) 
Ay: Bi Æ 0 
The model when Но holds is called the reduced or restricted model. When p, =0, 
model (2.64) reduces to: 
Y; = Во +; Reduced model (2.67) 


We fit this reduced model, by either the method of least squares or the method of 
maximum likelihood, and obtain the error sum of squares for this reduced model, denoted 
by SSE(R). When we fit the particular reduced model (2.67), it can be shown that the least 
squares and maximum likelihood estimator of £o is Y. Hence, the estimated expected value 
for each observation is bọ = Y, and the error sum of squares for this reduced model is: 


SSE(R) — Уо — by = Уо — Ӯ) = SSTO (2.68) 


Test Statistic 


Summary 


Chapter 2  Inferences in Regression and Correlation Analysis 73 


The logic now is to compare the two error sums of squares SSE(F) and SSE(R). It can be 
shown that SSE(F) never is greater than SSE(R): 


SSE(F) « SSE(R) (2.69) 


The reason is that the more parameters are in the model, the better one can fit the data 
and the smaller are the deviations around the fitted regression function. When SSE(F) is 
not much less than SSE(R), using the full model does not account for much more of the 
variability of the Y; than does the reduced model, in which case the data suggest that the 
reduced model is adequate (i.e., that Hp holds). To put this another way, when SSE(F) is 
close to SSE(R), the variation of the observations around the fitted regression function for 
the full model is almost as great as the variation around the fitted regression function for 
the reduced model. In this case, the added parameters in the full model really do not help to 
reduce the variation in the Y; about the fitted regression function. Thus, a small difference 
SSE(R) — SSE(F) suggests that Ho holds. On the other hand, a large difference suggests that 
H, holds because the additional parameters in the model do help to reduce substantially the 
variation of the observations Y; around the fitted regression function. 
The actual test statistic is a function of SSE(R) — SSE(F), namely: 


. SSE(R) — SSE(F) _ SSE(F) 
dfr — dfr ` dfr 

which follows the F distribution when Но holds. The degrees of freedom dfr and де are 
those associated with the reduced and full model error sums of squares, respectively. Large 
values of F* lead to H, because a large difference SSE(R) — SSE(F) suggests that H, holds. 
The decision rule therefore is: 

If F* < F(1— o; dfa — dfr, df), conclude Ho 

If F* > F(1 — o; а — dfr, dfr), conclude H, 


Е* (2.70) 


(2.71) 


For testing whether or not Ву = 0, we therefore have: 
SSE(R) — SSTO SSE(F) = SSE 
Фк —n—1 dfe =n—2 
so that we obtain when substituting into (2.70): 
SSTO —SSE |. SSE _ SSR | SSE _ MSR 


~m—-D)—-Mm—2 n-2 1  n-2 MSE 
which is identical to the analysis of variance test statistic (2.60). 


ж 


- 


The general linear test approach can be used for highly complex tests of linear statistical 
models, as well as for simple tests. The basic steps in summary form are: 


1. Fitthe full model and obtain the error sum of squares SSE(F). 
2. Fit the reduced model under Но and obtain the error sum of squares SSE(R). 
3. Use test statistic (2.70) and decision rule (2.71). 


74 Part One Simple Linear Regression 


2.9 Descriptive Measures of Linear Association between X and Y 


We have discussed the major uses of regression analysis—estimation of parameters and 
means and prediction of new observations—without mentioning the “degree of linear 
association” between X and Y, or similarterms. The reason is that the usefulness of estimates 
or predictions depends upon the width of the interval and the user’s needs for precision, 
which vary from one application to another. Hence, no single descriptive measure of the 
“degree of linear association” can capture the essential information as to whether a given 
regression relation is useful in any particular application. 

Nevertheless, there are times when the degree of linear association is of interest in its 
own right. We shall now briefly discuss two descriptive measures that are 2 frequently used 
in practice to describe the degree of linear association between X and Y^ 


Coefficient of Determination 

We saw earlier that SSTO measures the variation in the observations У;, or the uncertainty in 
predicting Y , when no account of the predictor variable X is taken. Thus, SSTO is a measure 
of the uncertainty in predicting Y when X is not considered. Similarly, SSE measures the 
variation in the Y; when a regression model utilizing the predictor variable X is employed. 
A natural measure of the effect of X in reducing the variation in Y, i.e., in reducing the 
uncertainty in predicting Y, is to express the reduction in variation (SSTO — SSE — SSR) 
as a proportion of the total variation: 


5 SSR SSE 


= SSTO ^ S$STO (e) 

The measure R? is called the coefficient of determination. Since 0 < SSE < SSTO, it 
follows that: 

0О< А2 <1 (2.72а) 


We may interpret R? as ће proportionate reduction of total variation associated with 
the use of the predictor variable X. Thus, the larger Ё? is, the more the total variation of 
Y is reduced by introducing the predictor variable X. The limiting values of R? occur as 
follows: 


1. When all observations fall on the fitted regression line, then SSE = 0 and R? = 1. 
This case is shown in Figure 2.8a. Here, the predictor variable X accounts for all variation 
in the observations Y;. 

2. When the fitted regression line is horizontal so that b, = 0 and Ў, = Y,then SSE = 
SSTO and R? = 0. This case is shown in Figure 2.8b. Here, there is no linear association 
between X and Y in the sample data, and the predictor variable X is of no help in reducing 
the variation in the observations Y; with linear regression. 


In practice, R? is not likely to be 0 or 1 but somewhere between these limits. The closer 
it is to 1, the greater is said to be the degree of linear association between X and Y. 


FIGURE 2.8 
Scatter Plots 
when R? = 1 
and R? = 0. 


Example 


Chapter 2 Inferences in Regression and Correlation Analysis 75 


(а) А2 = 1 (b) А2 = 0 


Y = by + bX 


b 
For the Toluca Company example, we obtained SSTO = 307,203 and SSR = 252,378. 
Hence: 


2 _ 252,378 


= = .82, 
307,203 o 


Thus, the variation in work hours is reduced by 82.2 percent when lot size is considered. 

The MINITAB output in Figure 2.2 shows the coefficient of determination R? labeled 
as R-sq in percent form. The output also shows the coefficient R-sq(adj), which will be 
explained in Chapter 6. 


Limitations of R? 


We noted that no single measure will be adequate for describing the usefulness of a regres- 
sion model for different applications. Still, the coefficient of determination is widely used. 
Unfortunately, it is subject to serious misunderstandings. We consider now three common 
misunderstandings: 


Misunderstanding 1. A high coefficient of determination indicates that useful 
predictions can be made. This is not necessarily correct. In the Toluca Company 
example, we saw that the coefficient of determination was high (R? — .82). Yet the 
90 percent prediction interval for the next lot, consisting of 100 units, was wide (332 
to 507 hours) and not precise enough to permit management to schedule workers 
effectively. 

Misunderstanding 2. A high coefficient of determination indicates that the estimated 
regression line is a good fit- Again, this is not necessarily correct. Figure 2.9a shows 
a scatter plot where the coefficient of determination is high (R? — .69). Yet a linear 
regression function would not be a good fit since the regression relation is curvilinear. 
Misunderstanding 3. A coefficient'of determination near Zero indicates that X and Y 
are not related. 'This also is not necessarily correct. Figure 2.9b shows a scatter plot 
where the coefficient of determination between X and Y is R? — .02. Yet X and Y are 
strongly related; however, the relationship between the two variables is curvilinear. 


76 Part One Simple Linear Regression 


FIGURE 2.9 
Illustrations 

of Two Misun- 
derstandings 
about 
Coefficient of 
Determination. 


(a) (b) 
Scatter Plot with А2 = .69 Scatter Plot with А2 = .02 
Linear regression is not a good fit Strong relation between X and Y 


0 5 10 15 
X 


Misunderstanding 1 arises because R? measures only a relative reduction from SSTO 
and provides no information about absolute precision for estimating a mean response or 
predicting a new observation. Misunderstandings 2 and 3 arise because R? measures the 
degree of linear association between X and Y, whereas the actual regression relation may 
be curvilinear. 


Coefficient of Correlation 


Example 


A measure of linear association between Y and X when both Y and X are random is the 
coefficient of correlation. This measure is the signed square root of R?: 


r=+VR2 (2.73) 


A plus or minus sign is attached to this measure according to whether the slope of the fitted 
regression line is positive or negative. Thus, the range of r is: -1 <r < 1. 


For the Toluca Company example, we obtained R? — .822. Treating X as a random variable, 
the correlation coefficient here is: 


г = +7 .822 = .907 


The plus sign is affixed since b, is positive. We take up the topic of correlation analysis in 
more detail in Section 2.11. 


Comments 


1. The value taken by R? in a given sample tends to be affected by the spacing of the X observations. 
This is implied in (2.72). SSE is not affected systematically by the spacing of the X; since, for regression 
model (2.1), o?(Y;) = o? at all X levels. However, the wider the spacing of the X; in the sample 
when b, 53 0, the greater will tend to be the spread of the observed Y; around Y and hence the greater 
SSTO will be. Consequently, the wider the X; are spaced, the higher Ё? will tend to be. 

2. The regression sum of squares SSR is often called the “explained variation" in Y, and the residual 
sum of squares SSE is called the “unexplained variation.” The coefficient R? then is interpreted in terms 
of the proportion of the total variation in Y (SSTO) which has been "explained" by X. Unfortunately, 


Chapter2 Inferences in Regression and Correlation Analysis 77 


this terminology frequently is taken literally and, hence, misunderstood. Remember that in a regression 
model there is no implication that Y necessarily depends on X in a causal or explanatory sense. 

3. Regression models do not contain a parameter to be estimated by R? or r. These are simply 
descriptive measures of the degree of linear association between X and Y in the sample observations 
that may, or may not, be useful in any instance. п 


2.10 Considerations in Applying Regression Analysis 


We have now discussed the major uses of regression analysis—to make inferences about 
the regression parameters, to estimate the mean response for a given X, and to predict 
a new observation Y for a given X. It remains to make a few cautionary remarks about 
implementing applications of regression analysis. 


1. Frequently, regression analysis is used to make inferences for the future. For instance, 
for planning staffing requirements, a school board may wish to predict future enrollments by 
using a regression model containing several demographic variables as predictor variables. 
Tn applications of this type, it is important to remember that the validity of the regression 
application depends upon whether basic causal conditions in the period ahead will be similar 
to those in existence during the period upon which the regression analysis is based. This 
caution applies whether mean responses are to be estimated, new observations predicted, 
or regression parameters estimated. 

2. In predicting new observations on Y, the predictor variable X itself often has to be 
predicted. For instance, we mentioned earlier the prediction of company sales for next year 
from the demographic projection of the number of persons 16 years of age or older next 
year. A prediction of company sales under these circumstances is a conditional prediction, 
dependent upon the correctness of the population projection. It is easy to forget the condi- 
tional nature of this type of prediction. 

3. Another caution deals with inferences pertaining to levels of the predictor variable 
that fall outside the range of observations. Unfortunately, this situation frequently occurs 
in practice. A company that predicts its sales from a regression relation of company sales 
to disposable personal income will often find the level of disposable personal income of 
interest (e.g., for the year ahead) to fall beyond the range of past data. If the X level does 
not fall far beyond this range, one may have reasonable confidence in the application of the 
regression analysis. On the other hand, if the X level falls far beyond the range of past data, 
extreme caution should be exercised since one cannot be sure that the regression function 
that fits the past data is appropriate over the wider range of the predictor variable. 

4. A statistical test that leads to the conclusion that 6, 4 0 does not establish a cause- 
and-effect relation between the predictor and response variables. As we noted in Chapter 1, 
with nonexperimental data both the X and Y variables may be simultaneously influenced by 
other variables not in the regression model. On the other hand, the existence of a regression 
relation in controlled experiments is often good evidence of: a cause-and-effect relation. 

5. We should note again that frequently we wish to estimate several mean responses 
or predict several new observations for different levels of the predictor variable, and that 
special problems arise in this case. The confidence coefficients for the limits (2.33) for 
estimating a mean response and for the prediction limits (2.36) for a new observation apply 


78 Раг Опе Simple Linear Regression 


only for a single level of X for a given sample. In Chapter 4, we discuss how to make 
multiple inferences from a given sample. 

6. Finally, when observations on the predictor variable X are subject to measurement 
errors, the resulting parameter estimates are generally no longer unbiased. In Chapter 4, we 
discuss several ways to handle this situation. 


2.11 Normal Correlation Models 


Distinction between Regression and Correlation Model 


The normal error regression model (2.1), which has been used through | this chapter 
and which will continue to be used, assumes that the X values are known constants. As a 
consequence of this, the confidence coefficients and risks of errors refer to repeated sampling 
when the X values are kept the same from sample to sample. 

Frequently, it may not be appropriate to consider the X values as known constants. For 
instance, consider regressing daily bathing suit sales by a department store on mean daily 
temperature. Surely, the department store cannot control daily temperatures, so it would not 
be meaningful to think of repeated sampling where the temperature levels are the same from 
sample to sample. As a second example, an analyst may use a correlation model for the two 
variables “height of person” and “weight of person" in a study of a sample of persons, each 
variable being taken as random. The analyst might wish to study the relation between the 
two variables or might be interested in making inferences about weight of a person on the 
basis of the person's height, in making inferences about height on the basis of weight, or in 
both. 

Other examples where a correlation model, rather than a regression model, may be 
appropriate are: 


1. To study the relation between service station sales of gasoline, and sales of auxiliary 
products. 

2. To study the relation between company net income determined by generally accepted 
accounting principles and net income according to tax regulations. 

3. То study the relation between blood pressure and age in human subjects. 


Thecorrelation model most widely employed is the normal correlation model. We discuss 
it here for the case of two variables. 


Bivariate Normal Distribution 


The normal correlation model for the case of two variables is based on the bivariate normal 
distribution. Let us denote the two variables as Ү and У. (We do not use the notation X and 
Y here because both variables play a symmetrical role in correlation analysis.) We say that 
Yı and Ү are jointly normally distributed if the density function of their joint distribution 
is that of the bivariate normal distribution. 


FIGURE 2.10 
Example of 
Bivariate 
Normal 
Distribution. 


Chapter 2 Inferences in Regression and Correlation Analysis 79 


КҮ, Yo) 


SS 
SEF 
60505524 
660605 {/ 


“5 
Ws 
NN 


Density Function. The density function of the bivariate normal distribution is as follows: 
1 


Woe s) 
210,02 /1 — pry 2(0—p5)lN ei 


uuu ses 


Note that this density function involves five parameters: ил, ш, 01, 02, p12- We shall explain 
the meaning of these parameters shortly. First, let us consider a graphic representation of 
the bivariate normal distribution. 

Figure 2.10 contains a SYSTAT three-dimensional plot of a bivariate normal probability 
distribution. The probability distribution is a surface in three-dimensional space. For every 
pair of (Yi, Y2) values, the density f (Y;, У) represents the height of the surface at that 
point. The surface is continuous, and probability corresponds to volume under the surface. 


FM, №) = 


Marginal Distributions. If Y, and Y; are jointly normally distributed, it can be shown 
that their marginal distributions have the following characteristics: 


The marginal distribution of Y, is normal with mean ш : 
and standard deviation о: (2:75a) 


xd 1/Yi- ш ? 
fied = rm a 5) 


i 
The marginal distribution of Y; is normal with mean u2 
and standard deviation o»: (2.75b) 


fh) = | ех | (8-8) 
а „270 Р 2 05^ 
Thus, when Y, and Y; are jointly normally distributed, each of the two variables by itself 


is normally distributed. The converse, however, is not generally true; if Y; and Y; are each 
normally distributed, they need not be jointly normally distributed in accord with (2.74). 


80 Part One Simple Linear Regression 


Meaning of Parameters. The five parameters of the bivariate normal density func- 
tion (2.74) have the following meaning: 


1. ш and c, are the mean and standard deviation of the marginal distribution of Y,. 

2. u2 and o; are the mean and standard deviation of the marginal distribution of У. 

3. P12 is the coefficient of correlation between the random variables Y, and Y2. This 
coefficient is denoted by o(Y;, Y2} in Appendix A, using the correlation operator notation, 
and defined in (A.25a): 


с 
ро = оу, Yo} = —— (2.76) 
0102 
Here, с; and o», as just mentioned, denote the standard deviations of Y; and Y2, and cj? 
denotes the covariance o (Y;, Y2} between Y, and Y; as defined in (A.21): 


on = 6 (Xy, Yo} = E(Q' — m) — 92)] (2.77) 
Note that с = сәу and Pi = Юд. 


If Y, and Y; are independent, o;2 = 0 according to (A.28) so that рз = 0. If Y, and 
Ү are positively related—that is, Y, tends to be large when Y is large, or small when 
Y, is small— oj, is positive and so is р. On the other hand, if Y, and Y» are negatively 
related—that is, Y, tends to be large when Y; is small, or vice versa— сә is negative and so 
is p12. The coefficient of correlation рз can take on any value between —1 and 1 inclusive. 
It assumes 1 if the linear relation between Y, and Y? is perfectly positive (direct) and —1 if 
itis perfectly negative (inverse). 


Conditional Inferences 


As noted, one principal use of a bivariate correlation model is to make conditional inferences 
regarding one variable, given the other variable. Suppose Y, represents a service station's 
gasoline sales and Y» its sales of auxiliary products. We may then wish to predict a service 
station's sales of auxiliary products Y2, given that its gasoline sales are Ү = $5,500. 

Such conditional inferences require the use of conditional probability distributions, which 
we discuss next. 


Conditional Probability Distribution of Y,. The density function of the conditional 
probability distribution of Y; for any given value of Y; is denoted by f(Y;|Y2) and defined 
as follows: 
f (Y, 1: Y; 2) 
(У) = ———— 2.78 

fO) = с (2.78) 
where f (Y; , Y2) is the joint density function of Y; and У, and р (Y2) is the marginal density 
function of Y2. When Y and Y; are jointly normally distributed according to (2.74) so that 
the marginal density function f2(Y2) is given by (2.75b), it can be shown that: 


The conditional probability distribution of Y, for any given 
value of Y; is normal with mean oj? + В:2У and standard 
deviation с> and its density function is: (2.79) 


1 1 / Yy — op — Во V? 
(Yi|Y2) = ex | | 
fae V 2012 Р 2 ШТ 


Chapter 2 Inferences in Regression and Correlation Analysis 81 


The parameters op, 812, and оту of the conditional probability distributions of Y, are 
functions of the parameters of the joint probability distribution (2.74), as follows: 


ор = ш — Mapa (2.802) 
Bi = Po (2.80b) 
о} = o7(1— pi) (2.80c) 


The parameter oj? is the intercept of the line of regression of Y, on Y2, and the parameter 
B12 is the slope of this line. Thus we find that the conditional distribution of Y,, given Y», is 
equivalent to the normal error regression model (1.24). 


Conditional Probability Distributions of У. The random variables Y, and Y; play sym- 
metrical roles in the bivariate normal probability distribution (2.74). Hence, it follows: 


The conditional probability distribution of Y, for any given 
value of Y, is normal with mean од + £2, Y, and standard 


deviation c2, and its density function is: (2.81) 
1 1 (Ya — оз Вау V" 
(IY) = ex |- E (Aa 
fin „ло P 2 91 


The parameters 0, B21, and сд of the conditional probability distributions of У are 
functions of the parameters of the joint probability distribution (2.74), as follows: 


О. 
ei = ua — npa (2.82a) 
Ba = р 22 (2.82b) 
о 
on = 02 (1 — 012) (2.82c) 


Important Characteristics of Conditional Distributions. "Three important characteris- 
tics of the conditional probability distributions of Y, are normality, linear regression, and 
constant variance. We take up each of these in turn. І 


1. The conditional probability distribution of Y, for any given value of Y, is normal. 
Imagine that we slice a bivariate normal distribution vertically at a given value of Y2, say, 
at Үр. That is, we slice it parallel to the Y, axis. This slicing is shown in Figure 2.11. The 
exposed cross section has the shape of a normal distribution, and after being scaled so that 
its area is 1, it portrays the conditional probability distribution of Y;, given that Y? = Y}2. 

This property of normality holds no matter what the value Y,2 is. Thus, whenever we 
slice the bivariate normal distribution parallel to the Y, axis, we-obtain (after proper scaling) 
a normal conditional probability distribution. 

2. The means of the conditional probability distributions of Y, fall on a straight line, and 
hence are a linear function of Y2: 


E(Y,|Yj] = ор + Ві (2.83) 


82 PartOne Simple Linear Regression 


FIGURE 2.11 
Cross Section 
of Bivariate 
Normal 
Distribution 
at Y,2. 


fY Y) 


\ 
N S 
N Tann 
N NS SS 

“у 


Here, o, is the intercept parameter and £2 the slope parameter. Thus, ће relation between 
the conditional means and Y; is given by a linear regression function. 

3. All conditional probability distributions of Y, have the same standard deviation сә. 
Thus, no matter where we slice the bivariate normal distribution parallel to the Y, axis, 
the resulting conditional probability distribution (after scaling to have an area of 1) has the 
same standard deviation. Hence, constant variances characterize the conditional probability 
distributions of ү. 


Equivalence to Normal Error Regression Model. Suppose that we select a random 
sample of observations (Y;, Y?) from a bivariate normal population and wish to make 
conditional inferences about Y;, given Y2. The preceding discussion makes it clear that the 
normal error regression model (1.24) is entirely applicable because: 


1. The Y, observations are independent. 
2. The Y, observations when У is considered given or fixed are normally distributed with 
mean E (Y;|Y2) = oj + Biz Ya and constant variance 07. 


Use of Regression Analysis. In view of the equivalence of each of the conditional bivariate 
normal correlation models (2.81) and (2.79) with the normal error regression model (1.24), 
all conditional inferences with these correlation models can be made by means of the 
usual regression methods. For instance, if a researcher has data that can be appropriately 
described as having been generated from a bivariate normal distribution and wishes to make 
inferences about Y2, given a particular value of Y;, the ordinary regression techniques will 
be applicable. Thus, the regression function of Y; on Y, can be estimated by means of (1.12), 
the slope of the regression line can be estimated by means of the interval estimate (2.15), 
a new observation Y2, given the value of Y;, can be predicted by means of (2.36), and so 
on. Computer regression packages can be used in the usual manner. To avoid notational 
problems, it may be helpful to relabel the variables according to regression usage: Y = Yo, 
X = ү. Of course, if conditional inferences on Y, for given values of Y are desired, the 
notation correspondences would be: Y = Yi, X = №. 


Chapter2 Inferences in Regression and Correlation Analysis 83 


Can we still use regression model (2.1) if Y, and У are not bivariate normal? It can be 
shown that all results on estimation, testing, and prediction obtained from regression model 
(2.1) apply if Y, = Y and Y, = X are random variables, and if the following conditions 
hold: 


1. The conditional distributions of the Y;, given X;, are normal and independent, with 
conditional means Во + f; X; and conditional variance с?2. 

2. The X; are independent random yarables whose probability distribution go ) does not 
involve the parameters Во, Ві, с 


These conditions require only that regression model (2.1) is appropriate for each condi- 
tional distribution of Y;, and that the probability distribution of the X; does not involve the 
regression parameters. If these conditions are met, all earlier results on estimation, testing, 
and prediction still hold even though the X; are now random variables. The major modi- 
fication occurs in the interpretation of confidence coefficients and specified risks of error. 
When X is random, these refer to repeated sampling of pairs of ( X;, Y;) values, where the 
X; values as well as the Y; values change from sample to sample. Thus, in our bathing suit 
sales illustration, a confidence coefficient would refer to the proportion of correct interval 
estimates if repeated samples of n days' sales and temperatures were obtained and the 
confidence interval calculated for each sample. Another modification occurs in the test's 
power, which is different when X is a random variable. 


Comments 


1. The notation for the parameters of the conditional correlation models departs somewhat from 
our previous notation for regression models. The symbol œ is now used to denote the regression 
intercept. The subscript 1|2 to œ indicates that Y, is regressed on Y». Similarly, the subscript 2]1 to œ 
indicates that Y, is regressed on Y,. The symbol 6,2 indicates that it is the slope in the regression of Y, 
on Ў, while £21 is the slope in the regression of Y; on Yj. Finally, сә is the standard deviation of the 
conditional probability distributions of Y? for any given Y,, while оту is the standard deviation of the 
conditional probability distributions of Y, for any given Y;. 

2. Two distinct regressions are involved in a bivariate normal model, that of Y, on Y? when Y; is 
fixed and that of Y; on Y, when Y; is fixed. In general, the two regression lines are not the same. For 
instance, the two slopes £j? and £2, are the same only if о = o», as can be seen from (2.806) and 
(2.825). 

3. When interval estimates for the conditional correlation models are obtained, the confidence 
coefficient refers to repeated samples where pairs of observations (Y;, Y?) are obtained from the 
bivariate normal distribution. L| 


Inferences on Correlation Coefficients 


A principal use of the bivariate normal córrelation model is to study the relationship between 
two variables. In a bivariate normal model, the parameter рг provides information about 
the degree of the linear relationship between the two variables Ү and Y}. 


Point Estimator of py. The maximum likelihood estimator of p12, denoted by ry, is 
given by: 
У — Y). - Рә) 


4 y (2.84) 
РЭА == Y) УХ 25 Y 1/2 


>= 


84 Part One Simple Linear Regression 


Example 


This estimator is often called the Pearson product-moment correlation coefficient. ЇЇ is a 
biased estimator of 0;2 (unless p12 = 0 or 1), but the bias is small when п is large. 
It can be shown that the range of г is: 


~1<ry <1 (2.85) 
Generally, values of ri? near 1 indicate a strong positive (direct) linear association be- 


tween Y, and Y; whereas values of r;2 near —1 indicate a strong negative (indirect) linear 
association. Values of r;? near 0 indicate little or no linear association between Y; and Y2. 


Test whether pı2 == 0. When the population is bivariate normal, it is frequently desired 
to test whether the coefficient of correlation is zero: 


Ho: =0 oe 
0: P12 (2.86) 
A: p12 x 0 
The reason for interest in this test is that in the case where Y, and Y are jointly normally 
distributed, p12 = 0 implies that Y, and Y? are independent. 


We can use regression procedures for the test since (2.80b) implies that the following 
alternatives are equivalent to those in (2.86): 


Н: = 0 
o: Во (2.86а) 
Ay: B 12 Ф 0 
and (2.826) implies that the following alternatives are also equivalent to the ones in (2.86): 
Ho: £a =0 
2.86b 
H,: Вл Ф 0 ( ) 


It can be shown that the test statistics for testing either (2.86a) ог (2.866) are the same 
and can be expressed directly in terms of r12: 


мп —2 
аны (2.87) 
у 1 = LV) 
If Ho holds, г* follows the t (n — 2) distribution. The appropriate decision rule to control 
the Type I error at о is: 
If |t*| < t(1 — 2/2; п — 2), conclude Ho 
If |t*| > «(1 — «/2;п — 2), conclude H, 


Test statistic (2.87) is identical to the regression ¢* test statistic (2.17). 


(2.88) 


A national oil company was interested in the relationship between its service station gasoline 
sales and its sales of auxiliary products. A company analyst obtained a random sample of 
23 of its service stations and obtained average monthly sales data on gasoline sales (Y;) 
and comparable sales of its auxiliary products and services (Y2). These data (not shown) 
resulted in an estimated correlation coefficient rj? = .52. Suppose the analyst wished to test 
whether or not the association was positive, controlling the level of significance at o = .05. 
The alternatives would then be: 


Hy ро < 0 
Ha: ро > 0 


Chapter 2 Inferences in Regression and Correlation Analysis 85 


and the decision rule based on test statistic (2.87) would be: 


If £* < t(1 — o5 n — 2), conclude Ho 
If t* > t(1 — o;n —2), conclude H, 


For o = .05, we require 1(.95; 21) = 1.721. Since: 


524/21 
== СЕ ыз 


V1 — (.52)2 
is greater than 1.721, we would conclude H,, that p12 > 0. The P-value for this test 15.006. 


Interval Estimation of pı2 Using the z’ Transformation. Because the sampling distri- 
bution of rj? is complicated when рә Æ 0, interval estimation of рә is usually carried 
out by means of an approximate procedure based on a transformation. This transformation, 
known as the Fisher z transformation, is as follows: b 


1. f1 
guasto (2.89) 
2 1— ГАУ) 


When n is large (25 ог more is a useful rule of thumb), the distribution of z’ is approximately 
normal with approximate mean and variance: 


күс е ы G=) (2.90) 
2 1 — pr 


c?(z) = ez (2.91) 
n—3 

Note that the transformation from 71? to z' in (2.89) is the same as the relation in (2.90) 

between рә and E{z’} = с. Also note that the approximate variance of z’ is a known 

constant, depending only on the sample size n. 

Table B.8 gives paired values for the left and right sides of (2.89) and (2.90), thus elim- 
inating the need for calculations. For instance, if r12 or рз equals .25, Table B.8 indicates 
that z’ ог ¢ equals .2554, and vice versa. The values on the two sides of the transformation 
always have the same sign. Thus, if rj? or p12 is negative, a minus sign is attached to the 
value in Table B.8. For instance, if rj = —.25, z' = —.2554. 


Interval Estimate. When the sample size is large (n > 25), the standardized statistic: 


z-—t 


ivo (2.92) 


is approximately a standard nof mal variable. Therefore, approximate 1 —o confidence limits 
for £ are: 


z-kz(1l—o/2)(z]) + (2.93) 


where z(1 — 0/2) is the (1 — 0/2)100 percentile of the standard normal distribution. The 
1 — о confidence limits for p? are then obtained by transforming the limits on ё by means 
of (2.90). 


86 Part One Simple Linear Regression 


Example 


An economist investigated food purchasing patterns by households in a midwestern city. 
Two hundred households with family incomes between $40,000 and $60,000 were selected 
to ascertain, among other things, the proportions of the food budget expended for beef and 
poultry, respectively. The economist expected these to be negatively related, and wished to 
estimate the coefficient of correlation with a 95 percent confidence interval. Some supporting 
evidence suggested that the joint distribution of the two variables does not depart markedly 
from a bivariate normal one. 

The point estimate of p;2 was rj? = —.61 (data and calculations not shown). To obtain 
an approximate 95 percent confidence interval estimate, we require: 


z = —.7089 when ғә = —.61 (from Table B.8) 
, 1 "d 
с {2 } = J200—3 = .07125 
z(.975) — 1.960 
Hence, the confidence limits for ¢, by (2.93), are —. 7089 + 1.960(.07125), and the approx- 
imate 95 percent confidence interval is: 


—.849 < t < —.569 
Using Table B.8 to transform back to рә, we obtain: 
—.69 < рә < —.51 


This confidence interval was sufficiently precise to be useful to the economist, confirming 
the negative relation and indicating that the degree of linear association is moderately high. 


Comments 

1. As usual, a confidence interval for рі can be employed to test whether or not рү has a specified 
value—say, .5—by noting whether or not the specified value falls within the confidence limits. 

2. It can be shown that the square of the coefficient of correlation, namely Pio, measures the 


relative reduction in the variability of Y? associated with the use of variable Y,. To see this, we noted 
earlier in (2.80c) and (2.82c) that: 


Sin = оү(1— рь) (2.94a) 
on = o3 (1 — Pr) (2.94b) 
We can rewrite these expressions as follows: 
o? — о], 
Po = er E p "2 (2.95a) 
І 
02 — о2 
pi = x (2.95b) 
2 


The meaning of pî, is now clear. Consider first (2.952). рї measures how much smaller relatively is 
the variability in the conditional distributions of Y;, for any given level of Y», than is the variability 
in the marginal distribution of Y,. Thus, p?, measures the relative reduction in the variability of Y, 
associated with the use of variable Y2. Correspondingly, (2.95b) shows that Pb also measures the 
relative reduction in the variability of Y? associated with the use of variable Yj. 


Chapter 2 Inferences in Regression and Correlation Analysis 87 


It can be shown that: 
О<р<1 (2.96) 


The limiting value Pn = 0 occurs when Y, and Y; are independent, so that the variances of each 
variable in the conditional probability distributions are then no smaller than the variance in the 
marginal distribution. The limiting value p2, — 1 occurs when there is no variability in the conditional 
probability distributions for each variable, so perfect predictions of either variable can be made from 
the other. 

3. The interpretation of 02, as measuring the relative reduction in the conditional variances as 
compared with the marginal variance is valid for the case of a bivariate normal population, but not 
for many other bivariate populations. Of course, the interpretation implies nothing in a causal sense. 

4. Confidence limits for р?, can be obtained by squaring the respective confidence limits for p12, 
provided the latter limits do not differ in sign. | 


Spearman Rank Correlation Coefficient k 


At times the joint distribution of two random variables У, and У, differs considerably from 
the bivariate normal distribution (2.74). In those cases, transformations of the variables Y, 
and Y, may be sought to make the joint distribution of the transformed variables approx- 
imately bivariate normal and thus permit the use of the inference procedures about pj? 
described earlier. 

When no appropriate transformations can be found, a nonparametric rank correlation 
procedure may be useful for making inferences about the association between Y, and Y. The 
Spearman rank correlation coefficient is widely used for this purpose. First, the observations 
on Y, (i.e., Yii, ..., Y,1) are expressed in ranks from 1 to n. We denote the rank of У; by 
К. Similarly, the observations on Y» (i.e., Yı2, ... , Yn2) are ranked, with the rank of Y;; 
denoted by R;;. The Spearman rank correlation coefficient, to be denoted by rs, is then 
defined as the ordinary Pearson product-moment correlation coefficient in (2.84) based on 
the rank data: 


УА — К)(К2 — №) 
= = 2 
[GR — Ri)? УЕ — R)” 
Here R, is the mean of the ranks № and R; is the mean of the ranks Rj. Of course, since 
the ranks R;, and Rj are the integers 1, . . . , п, it follows that № = Rz = (n +-1)/2. 


Like an ordinary correlation coefficient, the Spearman rank correlation coefficient takes 
on values between —1 and 1 inclusive: 


(2.97) 


rs = 


S Slant (2.98) 
The coefficient rs equals 1 when the ranks for У, 1 are identical to those for Y2, that is, when 
the case with rank 1 for Y, also has rank 1 for Y?, and so on. In that case, there is perfect 
association between the ranks for the two variables. The coefficient rs equals —1 when the 
case with rank 1 for Y, has rank n for Y2, the case with rank 2 for Y, has rank n — 1 for 
Y5, and so on. In that event, there is perfect inverse association between the ranks for the 
two variables. When there is little, if any, association between the ranks of Y, and Y», the 
Spearman rank correlation coefficient tends to have a value near zero. 


88 PartOne Simple Linear Regression 


Example 


TABLE 2.4 
Data on 
Population and 
Expenditures 
and Their 
Ranks—Sales 
Marketing 
Example. 


The Spearman rank correlation coefficient can be used to test the alternatives: 


Ho: There is no association between Y, and Y; (2.99) 
H,: There is an association between Y; and Y; ў 
A two-sided test is conducted here since H, includes either positive or negative association. 
When the alternative H, is: 


Ha: There is positive (negative) association between Y, and Y (2.100) 


an upper-tail (lower-tail) one-sided test is conducted. 

The probability distribution of rs under Ао is not difficult to obtain. It is based on the 
condition that, for any ranking of Y;, all rankings of Y, are equally likely; when there is no 
association between У, and У. Tables have been prepared and are prescnted i in specialized 
texts such as Reference 2.1. Computer packages generally do not present the probability 
distribution of r; under Но but give only the two-sided P-value. When the sample size n 
exceeds 10, the test can be carried out approximately by using test statistic (2.87): 


Ye /п — 2 
2 V1—r 


based on the ¢ distribution with n — 2 degrees of freedom. 


(2.101) 


A market researcher wished to examine whether an association exists between population 
size (Y,) and per capita expenditures for a new food product (Y2). The data for a random 
sample of 12 test markets are given in Table 2.4, columns 1 and 2. Because the distributions of 
the variables do not appear to be approximately normal, a nonparametric test of association 
is desired. The ranks for the variables are given in Table 2.4, columns 3 and 4. A computer 
package found that the coefficient of simple correlation between the ranked data in columns 
Запа 4 is ғ = .895. The alternatives of interest are the two-sided ones іп (2.99). Since п 


(1) (2) (3) (4) 
Per Capita 
Test Population | Expenditure 
Market. (in thousands) (dollars) 
i © Ya Үр Rn Rig 
1 29 127 1 2 
2 435 214 8 NM 
3 86 133 3 4 
4 1,090 ‚208 11 40 
5 219 153 7 6 
6 503. 184 :9: B 
7 47 130 “2, 3 
8 3,524. 217 12. 12. 
9: 185 141 6: 5 
10 98 154 ($ 7 
11 952 194 40 9° 
12 89 103 4 1 


Cited 


Chapter2 Inferences in Regression and Correlation Analysis 89 


exceeds 10 here, we use test statistic (2.101): 
PN .8954/12—2 
4/1 — (.895)2 
For o = .01, we require 1(.995; 10) = 3.169. Since |t*| = 6.34 > 3.169, we conclude H,, 


that there is an association between population size and per capita expenditures for the food 
product. The two-sided P-value of the test is .00008. 


— 6.34 


Comments 


1. In case of ties among some data values, each of the tied values is given the average of the ranks 
involved. 

2. It is interesting to note that had the data in Table 2.4 been analyzed by assuming the bivariate 
normal distribution assumption (2.74) and test statistic (2.87), then the strength of the association 
would have been somewhat weaker. In particular, the Pearson product-moment correlation coefficient 
is ri; = .674, with t* = .6744/10/4/1 — (.674)? = 2.885. Our conclusion would hhive been to 
conclude Ho, that there is no association between population size and per capita expenditures for the 
food product. The two-sided P-value of the test is .016. 

3. Another nonparametric rank procedure similar to Spearman’s rg is Kendall’s т. This statistic 
also measures how far the rankings of Y, and Y> differ from each other, but in a somewhat different 
way than the Spearman rank correlation coefficient. A discussion of Kendall's t may be found in 
Reference 2.2. " 


2.1. Gibbons, J. D. Nonparametric Methods for Quantitative Analysis. 2nd ed. Columbus, Ohio: 


References American Sciences Press, 1985. 
2.2. Kendall, M. G., and J. D. Gibbons. Rank Correlation Methods. 5th ed. London: Oxford University 
Press, 1990. 
Problems 2.1. A student working on a summer internship in the economic research department of a large 


corporation studied the relation between sales of a product (Y , in million dollars) and population 
(X, m million persons) in the firm's 50 marketing districts. The normal error regression model 
(2.1) was employed. The student first wished to test whether or not a linear association between 
Y and X existed. The student accessed a simple linear regression program and obtained the 
following information on the regression coefficients: 


х 


95 Регсепї 
Рагатетег Estimated Value Confidence Limits 
Intercept 7.431 19 —1.18518 16.0476 
Slope -755048 .452886 1.05721 


a. The student concluded from these results that there is a linear association between Y and 
X. Is the conclusion warranted? What is the implied level of significance? 


b. Someone questioned the negative lower confidence limit for the intercept, pointing out that 
dollar sales cannot be negative even if the population in a district is zero. Discuss. 
2.2. Ina test of the alternatives Ho: В; < 0 versus Н„: Ву > 0, an analyst concluded Но. Does this 
conclusion imply that there is no linear association between X and Y? Explain. 


90 PartOne Simple Linear Regression 


23. 


24. 


*2:5. 


*2.6. 


2.7 


A member of a student team playing an interactive marketing game received the following 
computer output when studying the relation between advertising expenditures (X) and sales 
(Y) for one of the team’s products: 


Estimated regression equation: Ў = 350.7 — .18X 
Two-sided P-value for estimated slope: .91 


The student stated: “The message I get here is that the more we spend on advertising this 

product, the fewer units we sell" Comment. 

Refer to Grade point average Problem 1.19. 

a. Obtain a 99 percent confidence interval for 6,. Interpret your confidence interval. Does it 
include zero? Why might the director of admissions be interested in whether ‘the confidence 
interval includes zero? ER 

b. Test, using the test statistic г“, whether or not a linear association exists between student's 
ACT score (X) and GPA at the end of the freshman year (Y). Use a level of significance of 
.01. State the alternatives, decision rule, and conclusion. 

c. What is the P-value of your test in part (0)? How does it support the conclusion reached in 
part (b)? 

Refer to Copier maintenance Problem 1.20. 


a. Estimate the change in the mean service time when the number of copiers serviced increases 
by one. Use a 90 percent confidence interval. Interpret your confidence interval. 


b. Conduct a f test to determine whether or not there is a linear association between X and Y 
here; control the о risk at .10. State the alternatives, decision rule, and conclusion. What is 
the P-value of your test? 

C. Are your results in parts (a) and (b) consistent? Explain. 

d. The manufacturer has suggested that the mean required time should not increase by more 
than 14 minutes for each additional copier that is serviced on a service call. Conduct a test to 
decide whether this standard is being satisfied by Tri-City. Control the risk of a Type I error 
at .05. State the alternatives, decision rule, and conclusion. What is the P-value of the test? 

e. Does Ро give any relevant information here about the “start-up” time on calls—i.e., about 
the time required before service work is begun on the copiers at a customer location? 

Refer to Airfreight breakage Problem 1.21. 

a. Estimate В; with а 95 percent confidence interval. Interpret your interval estimate. 

b. Conducta t test to decide whether or not there is a linear association between number of times 
a carton is transferred (X) and number of broken ampules (Y). Use a level of significance 
of .05. State the alternatives, decision rule, and conclusion. What is the P-value of the test? 

C. Во represents here the mean number of ampules broken when no transfers of the shipment 
are made—i.e., when X = 0. Obtain a 95 percent confidence interval for Bp and interpret it. 

d. A consultant has suggested, on the basis of previous experience, that the mean number of 
broken ampules should not exceed 9.0 when no transfers are made. Conduct an appropriate 
test, using œ = .025. State the alternatives, decision rule, and conclusion. What is the 
P-value of the test? 

e. Obtain the power of your test in part (b) if actually Ву = 2.0. Assume с {ћу} = .50. Also 
obtain the power of your test in part (d) if actually Во = 11. Assume o {bo} = .75. 


Refer to Plastic hardness Problem 1.22. 


a. Estimate the change in the mean hardness when the elapsed time increases by one hour. Use 
a 99 percent confidence interval. Interpret your interval estimate. 


Chapter2  Inferences in Regression and Correlation Analysis 91 


b. The plastic manufacturer has stated that the mean hardness should increase by 2 Brinell 
units per hour. Conduct a two-sided test to decide whether this standard is being satisfied; 
use œ = .01. State the alternatives, decision rule, and conclusion. What is the P-value of 
the test? 

с. Obtain the power of your test in part (b) if the standard actually is being exceeded by 
3 Brinell units per hour. Assume o (bi) = .1. 


2.8. Referto Figure 2.2 for the Toluca Company example. A consultant has advised that an increase 
of one unit in lot size should require an increase of 3.0 in the expected number of work hours 
for the given production item. 

a. Conduct a test to decide whether or not the increase in the expected number of work hours 
in the Toluca Company equals this standard. Use o — .05. State the alternatives, decision 
rule, and conclusion. 

b. Obtain the power of your test in part (a) if the consultant’s standard actually is being exceeded 
by .5 hour. Assume o {b1} = .35. " 

c. Why is F* = 105.88, given in the printout, not relevant for the test in part (а)? ~ 

2.9. Refer to Figure 2.2. A student, noting that s(b; is furnished in the printout, asks why s{¥;,} is 
not also given. Discuss. 

2.10. Foreach of the following questions, explain whether a confidence interval for a mean response 
or a prediction interval for a new observation is appropriate. 

a. What will be the humidity level in this greenhouse tomorrow when we set the temperature 
level at 31°С? 

b. How much do families whose disposable income is $23,500 spend, on the average, for meals 
away from home? 

c. How many kilowatt-hours of electricity will be consumed next month by commercial and 
industrial users in the Twin Cities service area, given that the index of business activity for 
the area remains at its present level? 

2.11. A person asks if there is a difference between the “mean response at X = X,” and the “mean 
of m new observations at X = X;,.” Reply. 

2.12. Can c? (pred) in (2.37) be brought increasingly close to 0 as n becomes large? Is this also the 
case for o?(f,) in (2.296)? What is the implication of this difference? 

2.13. Refer to Grade point average Problem 1.19. 

a. Obtain a 95 percent interval estimate of the mean freshman GPA for students whose ACT 
test score is 28. Interpret your confidence interval. 

b. Mary Jones obtained a score of 28 on the entrance test. Predict her freshman GPA' using a 
95 percent prediction interval. Interpret your prediction interval. 

c. Is the prediction interval in part (b) wider than the confidence interval in part (a)? Should it 
be? 

d. Determine the boundary values óf the 95 percent confidence band for the regression line 
when X; — 28. Is your-confidence band wider at this point than the confidence interval in 
part (a)? Should it be? 

*2.14. Refer to Copier maintenance Problem 1.20. 

a. Obtain a 90 percent confidence interval for the mean Service time on calls in which six 
copiers are serviced. Interpret your confidence interval. 

b. Obtain a 90 percent prediction interval for the service time on the next call in which six 
copiers are serviced. Is your prediction interval wider than the corresponding confidence 
interval in part (а)? Should it be? 


92 PartOne Simple Linear Regression 


*2.15. 


2.16. 


2.17. 


2.18. 


2.19. 


2.20. 


2.21. 


2.22. 


c. Management wishes to estimate the expected service time per copier on calls in which six 
copiers are serviced. Obtain an appropriate 90 percent confidence interval by converting the 
interval obtained in part (a). Interpret the converted confidence interval. 

d. Determine the boundary values of the 90 percent confidence band for the regression line 
when X; = 6. Is your confidence band wider at this point than the confidence interval in 
part (а)? Should it be? 

Refer to Airfreight breakage Problem 1.21. 

a. Because of changes in airline routes, shipments may have to be transferred more frequently 
than in the past. Estimate the mean breakage for the following numbers of transfers: X = 2, 
4. Use separate 99 percent confidence intervals. Interpret your results. 

b. The next shipment will entail two transfers. Obtain a 99 percent prediction interval for the 
number of broken ampules for this shipment. Interpret your prediction interval. 

с. In the next several days, three independent shipments will be made, each entailing two 
transfers. Obtain a 99 percent prediction interval for the mean number of ampules broken in 
the three shipments. Convert this interval into a 99 percent prediction interval for the total 
number of ampules broken in the three shipments. 

d. Determine the boundary values of the 99 percent confidence band for the regression line 
when X, = 2 and when X, == 4. Is your confidence band wider at these two points than the 
corresponding confidence intervals in part (a)? Should it be? 

Refer to Plastic hardness Problem 1.22. 

a. Obtain a 98 percent confidence interval for the mean hardness of molded items with an 
elapsed time of 30 hours. Interpret your confidence interval. 

b. Obtain a 98 percent prediction interval for the hardness of a newly molded test item with 
an elapsed time of 30 hours. 

c. Obtain a 98 percent prediction interval for the mean hardness of 10 newly molded test items, 
each with an elapsed time of 30 hours. 

d. Is the prediction interval in part (c) narrower than the one in part (b)? Should it be? 

e. Determine the boundary values of the 98 percent confidence band for the regression line 
when X, — 30. Is your confidence band wider at this point than the confidence interval in 
part (а)? Should it be? 

An analyst fitted normal error regression model (2.1) and conducted ар F test of В; = 0 versus 

P # 0. The P-value of the test was .033, and the analyst concluded Нл: В 5% 0. Was the a 

level used by the analyst greater than or smaller than .033? If the o level had been .01, what 

would have been the appropriate conclusion? 

For conducting statistical tests concerning the parameter В, why is the г test more versatile 

than the F test? 

When testing whether or not 6; = 0, why is the F test a one-sided test even though H, includes 

both 8, < 0 and В > 0? [Hint: Refer to (2.57).] 

A student asks whether R? is a point estimator of any parameter in the normal error regression 

model (2.1). Respond. 

A value of R? near | is sometimes interpreted to imply that the relation between Y and X is 

sufficiently close so that suitably precise predictions of Y can be made from knowledge of X. 

Is this implication a necessary consequence of the definition of R?? 

Using the normal error regression model (2.1) in an engineering safety experiment, a researcher 

found for the first 10 cases that R? was zero. Is it possible that for the complete set of 30 cases 

R? will not be zero? Could R? not be zero for the first 10 cases, yet equal zero for all 30 cases? 

Explain. 


Chapter2  Inferences in Regression and Correlation Analysis 93 


2.23. Refer to Grade point average Problem 1.19. 


a. 
b. 


e. 
f. 


Set up the ANOVA table. 


What is estimated by MSR in your ANOVA table? by MSE? Under what condition do MSR 
and MSE estimate the same quantity? 


. Conduct an F test of whether or not В; = 0. Control the о risk at .01. State the alternatives, 


decision rule, and conclusion. 


. Whatis the absolute magnitude of the reduction in the variation of Y when X is introduced 


into the regression model? What is the relative reduction? What is the name of the latter 
measure? 


Obtain r and attach the appropriate sign. 
Which measure, R? or r, has the more clear-cut operational interpretation? Explain. 


*224. Refer to Copier maintenance Problem 1.20. 


a. 


d. 
e. 


Set up the basic ANOVA table in the format of Table 2.2. Which elements of your table are ad- 
ditive? Also setup the ANOVA table in the format of Table 2.3. How do the two tab]es differ? 
Conduct an F test to determine whether or not there is a linear association between time 
spent and number of copiers serviced; use œ = .10. State the alternatives, decision rule, and 
conclusion. 


. By how much, relatively, is the total variation in number of minutes spent on a call reduced 


when the number of copiers serviced is introduced into the analysis? Is this a relatively small 
or large reduction? What is the name of this measure? 


Calculate r and attach the appropriate sign. 
Which measure, r or R?, has the more clear-cut operational interpretation? 


*2.25. Refer to Airfreight breakage Problem 1.21. 


a. 
b. 


d. 


Set up the ANOVA table. Which elements are additive? 

Conduct an F test to decide whether or not there is a linear association between the number 
of times a carton is transferred and the number of broken ampules; control the о risk at .05. 
State the alternatives, decision rule, and conclusion. 


. Obtain the ¢* statistic for the test in part (b) and demonstrate numerically its equivalence to 


the F* statistic obtained in part (b). 
Calculate R? and r. What proportion of the variation in Y is accounted for by introducing 
- X into the regression model? 


2.26. Refer to Plastic hardness Problem 1.22. 


a. 
b. 


d. 
*2.27. Refer to Muscle mass Problem 1.27. 


a. 


Set up the ANOVA table. 4 

Test by means of an F test whether or not there is a linear association between the hardness 
of the plastic and the elapsed time. Use œ = .01. State the alternatives, decision rule, and 
conclusion. 


. Plot the deviations Y; — Y; against X; on a graph. Plot the deviations Ӯ; — Y against X; 


on another graph, using the same scales as for the first graph. From your two graphs, does 
SSE or SSR appear to be the larger ари of SSTO? What does this imply about the 
magnitude of R?? 


Calculate R? and r. 


Conduct a test to decide whether or not there is a negative linear association between amount 
of muscle mass and age. Control the risk of Type I error at .05. State the alternatives, decision 
rule, and conclusion. What is the P-value of the test? 


94 Part Опе Simple Linear Regression 


b. 


C. 


The two-sided P-value for the test whether Во = 0 is 0+. Can it now be concluded 
that bo provides relevant information on the amount of muscle mass at birth for a female 
child? 

Estimate with a 95 percent confidence mterval the difference i expected muscle mass for 
women whose ages differ by one year. Why is it not necessary to know the specific ages to 
make this estimate? 


*228. Referto Muscle mass Problem 1.27. 


*2 29. 


2.30. 


231. 


2.32. 


a. 


Obtain a 95 percent confidence interval for the mean muscle mass for women of age 60. 
Interpret your confidence interval. 


. Obtain a 95 percent prediction interval for the muscle mass of a woman whose age is 60. Is 


the prediction interval relatively precise? 


. Determine the boundary values of the 95 percent confidence band forthe regression line 


when X, = 60. Is your confidence band wider at this point than the confidence interval in 
part (a)? Should it be? 


Refer to Muscle mass Problem 1.27. 


a. 


Plot the deviations Y; — Ў; against X; on one graph. Plot the deviations Ў; — Y against X; 
on another graph, using the same scales as in the first graph. From your two graphs, does 
SSE or SSR appear to be the larger component of SSTO? What does this imply about the 
magnitude of R?? 


. Set up the ANOVA table. 


c. Test whether or not В = 0 using an F test with œ = .05. State the alternatives, decision 


e. 


rule, and conclusion. 


. What proportion of the total variation in muscle mass remains “unexplained” when age is 


introduced into the analysis? Is this proportion relatively small or large? 
Obtain R? and r. 


Refer to Crime rate Problem 1.28. 


a. 


b. 


Test whether or not there is a linear association between crime rate and percentage of high 
school graduates, using a f test with a = .01. State the alternatives, decision rule, and 
conclusion. What is the P-value of the test? 


Estimate Ву with a 99 percent confidence interval. Interpret your interval estimate. 


Refer to Crime rate Problem 1.28 


a. 
b. 


d. 


Set up the ANOVA table. 

Carry out the test in Problem 2.30a by means of the F test. Show the numerical equivalence 
of the two test statistics and decision rules. Is the P-value for the F test the same as that for 
the т test? 


. By how much is the total variation in crime rate reduced when percentage of high school 


graduates is introduced into the analysis? Is this a relatively large or small reduction? 
Obtain r. 


Refer to Crime rate Problems 1.28 and 2.30. Suppose that the test in Problem 2.30a is to be 
carried out by means of a general linear test. 


a. 
b. 


State the full and reduced models. 
Obtain (1) SSE(F), (2) SSE(R), (3) dfr, (4) dfr, (5) test statistic F* for the general linear 
test, (6) decision rule. 


. Arethetest statistic F* and the decision rule for the general lmear test numerically equivalent 


to those in Problem 2.30a? 


2.33. 


2.34. 


2.35. 


2.36. 


2.37. 


2.38. 


2.39. 


2.40. 


2.41. 


*242. 


Chapter 2 Inferences in Regression and Correlation Analysis 95 


Indeveloping empirically a cost function from observed data on a complex chemical experiment, 
an analyst employed normal error regression model (2.1). Во was interpreted here as the cost 
of setting up the experiment. The analyst hypothesized that this cost should be $7.5 thousand 
and wished to test the hypothesis by means of a general linear test. 


a. Indicate the alternative conclusions for the test. 
b. Specify the full and reduced models. 


c. Without additional information, can you tell what the quantity dfr — dfr in test statistic (2.70) 
will equal in the analyst's test? Explam. 


Refer to Grade point average Problem 1.19. 


a. Would it be more reasonable to consider the X; as known constants or as random variables 
here? Explain. 

b. If the X; were considered to be random variables, would this have any effect on prediction 
intervals for new applicants? Explain. 


Refer to Copier maintenance Problems 1.20 and 2.5. How would the meaning of the &onfidence 
coefficient in Problem 2.5a change if the predictor variable were considered a random variable 
and the conditions on page 83 were applicable? 

A management trainee in a production department wished to study the relation between weight 
of rough casting and machining time to produce the finished block. The trainee selected castings 
so that the weights would be spaced equally apart in the sample and then observed the corre- 
sponding machining times. Would you recommend that a regression or a correlation model be 
used? Explain. 

A social scientist stated: “The conditions for the bivariate normal distribution are so rarely met 
in my experience that I feel much safer using a regression model.” Comment. 

A student was investigating from a large sample whether variables Y, and Y follow a bivariate 
normal distribution. The student obtained the residuals when regressing Y, on Y2, and also 
obtained the residuals when regressing Уз on Y;, and then prepared a normal probability plot 
for each set of residuals. Do these two normal probability plots provide sufficient information 
for determining whether the two variables follow a bivariate normal distribution? Explain. 
For the bivariate normal distribution with parameters ш = 50, u2 = 100, e; = 3, о = 4, and 
Pi2 = .80. 


a. State the characteristics of the marginal distribution of Y,. 
b. State the characteristics of the conditional distribution of Y? when Y, = 55. 
c. State the characteristics of the conditional distribution of Y; when Y; — 95. 


Explain whether any of the following would be affected if the bivariate normal model (2.74) 
were employed instead of the normal error regression model (2.1) with fixed levels of the 
predictor variable: (1) point estimates of the regression coefficients, (2) confidence limits for 
the regression coefficients, (3) interpretation of the confidence coefficient. 

Refer to Plastic hardness Problem 1.22. A student was analyzing these dara and received the 
following standard query from the interactive regression and correlation computer package: 

CALCULATE CONFIDENCE INTERVAL FOR POPULATION CORRELATION COEFFI- 
CIENT RHO? ANSWER Y OR М. Would a “уез” response lead to meaningful information 
here? Explain. 2 » 

Property assessments. The data that follow show assessed value for property tax purposes 
(Y;, in thousand dollars) and sales price (Y2, in thousand dollars) for a sample of 15 parcels 
of land for industrial development sold recently in “arm’s length" transactions in a tax district. 
Assume that bivariate normal model (2.74) is appropriate here. 


96 PartOne Simple Linear Regression 


2.43. 


*2 44. 


2.45. 


2.46. 


*2.47. 


i: 1 2 3 sss 13 14 15 
Үл: 13.9 16.0 10.3 ые 14.9 12.9 15.8 
Yi: 28.6 34.7 21.0 iei 35.1 30.0 36.2 


a. Plot the data in a scatter diagram. Does the bivariate normal model appear to be appropriate 
here? Discuss. 


b. Calculate гү. What parameter is estimated by гү)? What is the interpretation of this 
parameter? 


c. Test whether or not Y, and Y; are statistically independent in the population, using test statis- 
tic (2.87) and level of significance .01. State the alternatives, decision rule, and conclusion. 

d. То test pi2 = .6 versus p2 ж .6, would it be appropriate to use test statistic (2.87)? 

Contract profitability. A cost analyst for a drilling and blasting contractor xamined 84 con- 

tracts handled in the last two years and found that the coefficient of correlation between value 

of contract (Y1) and profit contribution generated by the contract (Үз) is rj = .61. Assume 

that bivariate normal model (2.74) applies. 

a. Test whether or not Y and Y; are statistically independent in the population; use a = .05. 
State the alternatives, decision rule, and conclusion. 

b. Estimate рү; with a 95 percent confidence interval. 

с. Convert the confidence interval in part (b) toa 95 percent confidence interval for p?,. а 
this interval estimate. 


Bid preparation. A building construction consultant studied the relationship between cost of 
bid preparation (У) and amount of bid (Y2) for the consulting firm's clients. In a sample of 
103 bids prepared by clients, r;z = .87. Assume that bivariate normal model (2.74) applies. 


a. Test whether or not pj? = 0; control the risk of Type I error at .10. State the alternatives, 
decision rule, and conclusion. What would be the implication if pj; = 0? 

b. Obtain a 90 percent confidence interval for p12. Interpret this interval estimate. 

с. Convert the confidence interval in part (b) to a 90 percent confidence interval for pp. 

Water flow. An engineer, desiring to estimate the coefficient of correlation pj; between rate 

of water flow at point A in a stream (У) and concurrent rate of flow at point B (Y2), obtained 

ri? = .83 in a sample of 147 cases. Assume that bivariate normal model (2.74) is appropriate. 

a. Obtain a 99 percent confidence interval for рә. 

b. Convert the confidence interval in part (a) to a 99 percent confidence interval for рї. 

Refer to Property assessments Problem 2.42. There is some question as to whether or not 

bivariate model (2.74) is appropriate. 

a. Obtain the Spearman rank correlation coefficient rs. 

b. Test by means of the Spearman rank correlation coefficient whether an association exists 
between property assessments and sales prices using test statistic (2.101) with о = .01. 
State the alternatives, decision rule, and conclusion. 

c. How do your estimates and conclusions in parts (a) and (b) compare to those obtained in 
Problem 2.42? 

Refer to Muscle mass Problem 1.27. Assume that the normal bivariate model (2.74) is 

appropriate. 

a. Compute the Pearson product-moment correlation coefficient rj2. 


b. Тез whether muscle mass and age are statistically independent in the population; use 
о = .05. State the alternatives, decision rule, and conclusion. 


Exercises 


2.48. 


2.49. 


2.50. 
2.51. 
2.52. 


2.53. 


2.54. 


2.55. 
2.56. 


Chapter 2  Inferences in Regression and Correlation Analysis 97 


c. The bivariate normal model (2.74) assumption is possibly inappropriate here. Compute the 
Spearman rank correlation coefficient, rs. 

d. Repeat part (b), this time basing the test of independence on the Spearman rank correlation 
computed in part (c) and test statistic (2.101). Use о = .05. State the alternatives, decision 
rule, and conclusion. 

e. How do your estimates and conclusions in parts (a) and (b) compare to those obtained in 
parts (c) and (d)? 

Refer to Crime rate Problems 1.28, 2.30, and 2.31. Assume that the normal bivariate model 

(2.74) is appropriate. 

a. Compute the Pearson product-moment correlation coefficient r2. 

b. Test whether crime rate and percentage of high school graduates are statistically independent 
in the population; use a = .01. State the alternatives, decision rule, and conclusion. 

c. How do your estimates and conclusions in parts (a) and (b) compare to those obtained in 
2.31b and 2.303, respectively? b 

Refer to Crime rate Problems 1.28 and 2.48. The bivariate normal model (2.74) assumption 

is possibly inappropriate here. 

a. Compute the Spearman rank correlation coefficient rs. 

b. Test by means of the Spearman rank correlation coefficient whether an association exists 
between crime rate and percentage of high school graduates using test statistic (2.101) and 
a level of significance .01. State the alternatives, decision rule, and conclusion. 

c. How do your estimates and conclusions in parts (a) and (b) compare to those obtained in 
Problems 2.48a and 2.48b, respectively? 


Derive the property in (2.6) for the k;. 

Show that bo as defined in (2.21) is an unbiased estimator of Во. 

Derive the expression in (2.22b) for the variance of bo, making use of (2.31). Also explain how 

variance (2.22b) is a special case of variance (2.29b). 

(Calculus needed.) 

a. Obtain the likelihood function for the sample observations Y;,..., Y, given X1, ..., Xn, if 
the conditions on page 83 apply. 

b. Obtain the maximum likelihood estimators of fo, Ві, and c?. Are the estimators of Bo and 
f the same as those in (1.27) when the X; are fixed? 

Suppose that normal error regression model (2.1) is applicable except that the error variance 

is not constant; rather the variance is larger, the larger is X. Does 6, = 0 still imply that there 

is no linear association between X and Y? That there is no association between X and Y? 

Explain. ; 

Derive the expression for SSR in (2.51). 

In a small-scale regression study, five observations on Y were obtained corresponding to X = 1, 

4, 10, 11, and 14. Assume that c = .6, Во = 5, and f, = 3. 

a. What are the expected values of MSR and MSE here? ° 

b. For determining whether or not a regression relation exists, would it have been better or 
worse to have made the five observations at X = 6, 7, 8, 9, and 10? Why? Would the 


same answer apply if the principal purpose were to estimate the mean response for X = 8? 
Discuss. 


98 PartOne Simple Linear Regression 


Projects 


2.51. 


2.58. 


2.59. 


2.60. 


2.61. 


2.62. 


2.63. 


2.64. 


2.65. 


2.66. 


The normal error regression model (2.1) is assumed to be applicable. 

a. When testing Ho: Ву = 5 versus H,: Ву 4 5 by means of a general linear test, what is the 
reduced model? What are the degrees of freedom dfr? 

b. When testing Ho: ffo = 2, В, = 5 versus H,: not both Bo = 2 and В, = 5 by means of a 
general linear test, what is the reduced model? What are the degrees of freedom dfr? 

The random variables Y, and Y; follow the bivariate normal distribution in (2.74). Show that if 

P12 = O, Y, and Y, are independent random variables. 

(Calculus needed.) 

a. Obtain the maximum likelihood estimators of the parameters of the bivariate normal distri- 
bution in (2.74). 

b. Using the results in part (a), obtain the maximum likelihood estimators of the parameters of 
the conditional probability distribution of Y, for any value of Y; in (2.80). 

c. Show that the maximum likelihood estimators of о> and fj; obtained in part (b) are the 
same as the least squares estimators (1.10) for the regression coefficients in the simple linear 
regression model. 

Show that test statistics (2.17) and (2.87) are equivalent. 


Show that the ratio SSR/SSTO is the same whether Y, is regressed on Y? or Y; is regressed on 
Y,. [Hint: Use (1.102) and (2.51).] i 


Refer to the CDI data set in Appendix C.2 and Project 1.43. Using R? as the criterion, which 
predictor variable accounts for the largest reduction in the variability in the number of active 
physicians? 

Refer to the CDI data set in Appendix C.2 and Project 1.44. Obtain a separate interval estimate 

of Ву for each region. Use a 90 percent confidence coefficient in each case. Do the regression 

lines for the different regions appear to have similar slopes? 

Refer to the SENIC data set in Appendix C.1 and Project 1.45. Using R? as the criterion, which 

predictor variable accounts for the largest reduction in the variability of the average length of 

stay? 

Refer to the SENIC data set in Appendix C.1 and Project 1.46. Obtain a separate interval 

estimate of В; for each region. Use a 95 percent confidence coefficient in each case. Do the 

regression lines for the different regions appear to have similar slopes? 

Five observations on Y are to be taken when X — 4, 8, 12, 16, and 20, respectively. The true 

regression function is E{Y} = 20 + 4X, and the ғ; are independent N (0, 25). 

a. Generate five normal random numbers, with mean 0 and variance 25. Consider these random 
numbers as the error terms for the five Y observations at X — 4,8,12, 16, and 20 and calculate 
Yi, Yo, Үз, Үд, and Ys. Obtain the least squares estimates Ро and b, when fitting a straight 
line to the five cases. Also calculate Ў, when Хр = 10 and obtain a 95 percent confidence 
interval for E {Yp} when X, = 10. 

b. Repeat part (a) 200 times, generating new random numbers each time. 

c. Make a frequency distribution of the 200 estimates b,. Calculate the mean and standard 
deviation of the 200 estimates bı. Are the results consistent with theoretical expectations? 

d. What proportion of the 200 confidence intervals for E(Y;) when Хһ = 10 include E(Y;)? 
Is this result consistent with theoretical expectations? 


Chapter 2 Inferences in Regression and Correlation Analysis 99 


2.67. Refer to Grade point average Problem 1.19. 

a. Plot the data, with the least squares regression line for ACT scores between 20 and 30 
superimposed. 

b. On the plot in part (a), superimpose a plot of the 95 percent confidence band for the true 
regression line for ACT scores between 20 and 30. Does the confidence band suggest that 
the true regression relation has been precisely estimated? Discuss. 

2.68. Refer to Copier maintenance Problem 1.20. 

a. Plot the data, with the least squares regression line for numbers of copiers serviced between 
1 and 8 superimposed. 

b. On the plot in part (a), superimpose a plot of the 90 percent confidence band for the true 
regression line for numbers of copiers serviced between 1 and 8. Does the confidence band 
suggest that the true regression relation has been precisely estimated? Discuss. 


Chapter 


Diagnostics and 
Remedial Measures ~ 


x 


When a regression model, such as the simple linear regression model (2.1), is considered 
for an application, we can usually not be certain in advance that the model is appropriate 
for that application. Any one, or several, of the features of the model, such as linearity 
of the regression function or normality of the error terms, may not be appropriate for the 
particular data at hand. Hence, it is important to examine the aptness of the model for the 
data before inferences based on that model are undertaken. In this chapter, we discuss some 
simple graphic methods for studying the appropriateness of a model, as well as some formal 
statistical tests for doing so. We also consider some remedial techniques that can be helpful 
when the data are not in accordance with the conditions of regression model (2.1). We 
conclude the chapter with a case example that brings together the concepts and methods 
presented in this and the earlier chapters. 

While the discussion in this chapter is in terms of the appropriateness of the simple 
linear regression model (2.1), the basic principles apply to all statistical models discussed 
in this book. In later chapters, additional methods useful for examining the appropriateness 
of statistical models and other remedial measures will be presented, as well as methods for 
validating the statistical model. 


3.1 Diagnostics for Predictor Variable 


We begin by considering some graphic diagnostics for the predictor variable. We need 
diagnostic information about the predictor variable to see if there are any outlying X values 
that could influence the appropriateness of the fitted regression function. We discuss the 
role of influential cases in detail in Chapter 10. Diagnostic information about the range and 
concentration of the X levels in the study is also useful for ascertaining the range of validity 
for the regression analysis. 

Figure 3.1a contains a simple dot plot for the lot sizes in the Toluca Company example 
in Figure 1.10. A dot plot is helpful when the number of observations in the data set is not 
large. The dot plot in Figure 3.1a shows that the minimum and maximum lot sizes are 20 
and 120, respectively, that the lot size levels are spread throughout this interval, and that 


Chapter3 Diagnostics and Remedial Measures 101 


FIGURE 3.1 МІМІТАВ and SYGRAPH Diagnostic Plots for Predictor Variable—Toluca Company Example. 
(a) Dot Plot (b) Sequence Plot 
150 
e e е е : 1 00 
e ө ө e ө ө ө ө e 
e e e е е е е е е е е Б 
е к +} d 
20 40 60 80 100 120 8 
Lot Size 50 
b 
0 10 20 30 
Кип 
(с) Stem-and-Leaf Plot (d) Box Plot 
2 0 
3 000 
4 00 
эн 00 
6 0 ПЕШИНА prp p шше Ды 
7M 000 
8 000 20 40 60 | 80 100 120 
9H 0000 Lot Size 
10 00 
11 00 
12 0 


there are no lot sizes that are far outlying. The dot plot also shows that in a number of cases 
several runs were made for the same lot size. 

A second useful diagnostic for the predictor variable is a sequence plot. Figure 3.1b 
contains a time sequence plot of the lot sizes for the Toluca Company example. Lot size is 
here plotted against production run (i.e., against time sequence). The points in the plot are 
connected to show more effectively the time sequence. Sequence plots should be utilized 
whenever data are obtained in a sequence, such as over time or for adjacent geographic 
areas. The sequence plot in Figure 3.1b contains no special pattern. If, say, the plot had 
shown that smaller lot sizes had been utilized early on and larger lot sizes later on, this 
information could be very helpful for subsequent diagnostic studies of the aptness of the 
fitted regression model. . 2 

Figures 3.1c and 3.1d contain two other diagnostic plots that present information similar 
to the dot plot in Figure 3.1a. The stem-and-leaf plot in Figure 3.1c provides information 
similar to a frequency histogram: By displaying the last digits, this plot also indicates here 
that all lot sizes in the Toluca Company example were multiples of 10. The letter M in the 


102 Part One Simple Linear Regression 


SYGRAPH output denotes the stem where the median is located, and the letter H denotes 
the stems where the first and third quartiles (hinges) are located. 

The box plot in Figure 3.1d shows the minimum and maximum lot sizes, the first and 
third quartiles, and the median lot size. We see that the middle half of the lot sizes range 
from 50 to 90, and that they are fauly symmetrically distributed because the median is 
located in the middle of the central box. A box plot is particularly helpful when there are 
many observations in the data set. 


3.2 Residuals 


Direct diagnostic plots for the response variable Y are ordinarily not too usefutin regression 
analysis because the values of the observations on the response variable are a function of 
the level of the predictor variable. Instead, diagnostics for the response variable are usually 
carried out indirectly through an examination of the residuals. 

The residual e;, as defined in (1.16), is the difference between the observed value Y; and 
the fitted value Ӯ,: 


ey t (3.1) 


The residual may be regarded as the observed error, in distinction to the unknown true error 
єр in the regression model: 


ё = Y; — E(Yi] (3.2) 


For regression model (2.1), the error terms £; are assumed to be independent normal 
random variables, with mean 0 and constant variance с?. If the model is appropriate for the 
data at hand, the observed residuals e; should then reflect the properties assumed for the ¢;. 
This is the basic idea underlying residual analysis, a highly useful means of examining the 
aptness of a statistical model. 


Properties of Residuals 


Mean. The mean of the л residuals e; for the simple linear regression model (2.1) is, 
by (1.17): 
ë= dei =0 (3.3) 
n 
where e denotes the mean of the residuals. Thus, since e is always 0, it provides no infor- 
mation as to whether the true errors e; have expected value E {e;} = 0. 


Variance. The variance of the л residuals e; is defined as follows for regression 
model (2.1): 
s2- У (е; — а)? _ die _ SSE 
n—2 n—-2 n-2 
If the model is appropriate, MSE is, as noted earlier, an unbiased estimator of the variance 
of the error terms с?2. 


— MSE (3.4) 


Nonindependence. The residuals e; are not independent random variables because they 
involve the fitted values Y; which are based on the same fitted regression function. As 


Chapter 3 Diagnostics and Remedial Measures 103 


a result, the residuals for regression model (2.1) are subject to two constraints. These 
are constraint (1.17)—that the sum of the e; must be O—and constraint (1.19)—-that the 
products X;e; must sum to 0. 

Whenthe sample size is large in comparison to the number of parameters in the regression 
model, the dependency effect among the residuals e; is relatively unimportant and can be 
ignored for most purposes. 


Semistudentized Residuals 
At times, it is helpful to standardize the residuals for residual analysis. Since the standard 
deviation of the error terms e; is с, which is estimated by MSE, it is natural to consider 
the following form of standardization: 


š B = e е; 
е; = = 
ИМЕ „MSE 


If MSE were an estimate of the standard deviation of the residual e;, we would call еў 
a studentized residual. However, the standard deviation of e; is complex and varies for 
the different residuals e;, and J/ MSE is only an approximation of the standard deviation 

, of e;. Hence, we call the statistic e? in (3.5) a semistudentized residual. We shall take 
up studentized residuals in Chapter 10. Both semistudentized residuals and studentized 
residuals can be very helpful in identifying outlying observations. 


(3.5) 


Departures from Model to Be Studied by Residuals 


We shall consider the use of residuals for examining six important types of departures from 
the simple linear regression model (2.1) with normal errors: 


. The regression function is not linear. 

. The error terms do not have constant variance. 

. The error terms are not independent. 

. The model fits all but one or a few outlier observations. 

. The error terms are not normally distributed. 

. One or several important predictor variables have been omitted from the model. 


С л > шом + 


3.3 Diagnostics for Residuals 


Se 


We take up now some informal diagnostic plots of residuals to provide information on 
whether any of the six types of departures from the simple linear regression model (2.1) 
just mentioned are present. The following plots of residuals (or semistudentized residuals) 
will be utilized here for this purpose: ' 


. Plot of residuals against predictor variable. 

. Plot of absolute or squared residuals against predictor variable. 
. Plot of residuals against fitted values. А 

. Plot of residuals against time or other sequence. 

. Plots of residuals against omitted predictor variables. 

. Box plot of residuals. 

Normal probability plot of residuals. 


мои рор 


104. Part One Simple Linear Regression 


FIGURE 3.2 MINITAB and SYGRAPH Diagnostic Residual Plots—Toluca Company Example. 


(a) Residual Plot against X (b) Sequence Plot 
150 150 
100 e 100 
e 
= е рын 
S 50 ee ее S 50 
© е ə 
£ 0 е Н A e & 0 
e е е E N S 
e 
- e 2 
50 е ө o 50 
100 ——— — A | 
0 10 20 30 
Lot Size Rün 
(c) Box Plot ; (d) Normal Probability Plot 
150 
100 ы 
е 
2= е 
= 50 " 
ВЕ анан 3 ө 
-70 -35 0 35 70 105 £ 0 "d 
: eve 
Residual 
—50 
—100 П а] 
—100 0 50 100 


Expected 


Figure 3.2 contains, for the Toluca Company example, MINITAB and SYGRAPH plots 
of the residuals in Table 1.2 against the predictor variable and against time, a box plot, and 
a normal probability plot. All of these plots, as we shall see, support the appropriateness of 
regression model (2.1) for the data. 


We turn now to consider how residual analysis can be helpful in studying each of the six 
departures from regression model (2.1). 


Nonlinearity of Regression Function 


Whether a linear regression function is appropriate for the data being analyzed can be 
studied from a residual plot against the predictor variable or, equivalently, from a residual 
plot against the fitted values. Nonlinearity of the regression function can also be studied 
from a scatter plot, but this plot is not always as effective as a residual plot. Figure 3.3a 


FIGURE 3.3 
Scatter Plot 
and Residual 
Plot 
Tilustrating 
Nonlinear 
Regression 
Function— 
Transit 
Example. 


TABLE 3.1 
Number of 
Maps 
Distributed 
and Increase in 
Ridership— 
Transit 
Example. 


Chapter 3 Diagnostics and Remedial Measures 105 


(a) Scatter Plot (b) Residual Plot 
2 


g 
8 
8 1 Ы 
Е 
=, ө ө o 
е. E 
$ 3 
© $ " ө 
cc 
E 
g e d 
9 
Е 
100 140 180 220 B 100 140 180 220 
Maps Distributed (thousands) Maps Distributed (thousands) L 
(0 (2) (3) A) 
Increase in Maps 
Ridership: Distributed Fitted 
City (thousands) (thousands) Value Residual 
i Y; Xi f; Y —f;—e, 
1 .60 80 1.66 —1.06 
2 6.70 220 7.75 —1.05. 
3 5.30 140 4.27 1.03 
4 4.00. 120 3.40 .60. 
5 6.55 180 6.01 54 
6 2.15 100 2.53 —.38 
7 6.60 200 6.88 —.28 
8 5.75 160 5.14 61 


= -1.82 + .0435X 


contains a scatter plot of the data and the fitted regression line for a study of the relation 
between maps distributed and bus ridership in eight test cities. Here, X is the number of 
bus transit maps distributed free to residents of the city at the beginning of the test period 
and Y is the increase during the test period in average daily bus ridership during nonpeak 
hours. The original data and fitted values are given in Table 3.1, columns 1, 2, and 3. The 
plot suggests strongly that a linear regression function is not appropriate. 

Figure 3.3b presents a plot of the residuals, shown in Table 3.1, column 4, against the 
predictor variable X. The lack of fit of the linear regression function is even more strongly 
suggested by the residual plot against X in Figure 3.3b than by the scatter plot. Note that 
the residuals depart from 0 in a systematic fashion; they are,negative for smaller X values, 
positive for medium-size X values, and negative again for large X values. 

In this case, both Figures 3.3a and 3.3b point out the lack of linearity of the regression 
function. In general, however, the residual plot is to be preferred, because it has some 
important advantages over the scatter plot. First, the residual plot can easily be used for 
examining other facets of the aptness of the model. Second, there are occasions when the 


106 PartOne Simple Linear Regression 


FIGURE 3.4 
Prototype 
Residual Plots. 


(a) (b) 


X Time 
(с) (d) 


scaling of the scatter plot places the Y; observations close to the fitted values Ў, , for instance, 
when there is a steep slope. It then becomes more difficult to study the appropriateness of 
a linear regression function from the scatter plot. A residual plot, on the other hand, can 
clearly show any systematic pattern in the deviations around the fitted regression line under 
these conditions. 

Figure 3.4a shows a prototype situation of the residual plot against X when a linear 
regression model is appropriate. The residuals then fall within a horizontal band centered 
around 0, displaying no systematic tendencies to be positive and negative. This is the case 
in Figure 3.2a for the Toluca Company example. 

Figure 3.4b shows a prototype situation of a departure from the linear regression model 
that indicates the need for a curvilinear regression function. Here the residuals tend to vary 
in a systematic fashion between being positive and negative. This is the case in Figure 3.3b 
for the transit example. A different type of departure from linearity would, of course, lead 
to a picture different from the prototype pattern in Figure 3.4b. 


Comment 


A plot of residuals against the fitted values ¥ provides equivalent information as a plot of residuals 
against X for the simple linear regression model, and thus is not needed in addition to the residual plot 
against X. The two plots provide the same information because the fitted values Ў; аге а linear function 
of the values X; for the predictor variable. Thus, only the X scale values, not the basic pattern of the 
plotted points, are affected by whether the residual plot is against the X; or the Î,. For curvilinear 
regression and multiple regression, on the other hand, separate plots of the residuals against the fitted 
values and against the predictor variable(s) are usually helpful. ш 


Chapter 3 Diagnostics and Remedial Measures 107 


Nonconstancy of Error Variance 


FIGURE 3.5 
Residual Plots 
Tilustrating 
Nonconstant 
Error 
Variance. 


Plots of the residuals against the predictor variable or against the fitted values are not only 
helpful to study whether a linear regression function is appropriate but also to examine 
whether the variance of the error terms is constant. Figure 3.5a shows a residual plot against 
age for a study of the relation between diastolic blood pressure of healthy, adult women (Y) 
and their age (X). The plot suggests that the older the woman is, the more spread out the 
residuals are. Since the relation between blood pressure and age is positive, this suggests 
that the error variance is larger for older women than for younger ones. 

The prototype plot in Figure 3.4a exemplifies residual plots when the error term variance 
is constant. The residual plot in Figure 3.2a for the Toluca Company example is of this type, 
suggesting that the error terms have constant variance here. 

Figure 3.4c shows a prototype picture of residual plots when the error variance increases 
with X. In many business, social science, and biological science applications, departures 
from constancy of the error variance tend to be of the “megaphone” type shown in Fig- 
ure 3.4c, as in the blood pressure example in Figure 3.5a. One can also encounter error 
variances decreasing with increasing levels of the predictor variable and occasionally vary- 
ing in some more complex fashion. 

Plots of the absolute values of the residuals or of the squared residuals against the pre- 
dictor variable X or against the fitted values Ў are also useful for diagnosing nonconstancy 
of the error variance since the signs of the residuals are not meaningful for examining the 
constancy of the error variance. These plots are especially useful when there are not many 
cases in the data set because plotting of either the absolute or squared residuals places all of 
the information on changing magnitudes of the residuals above the horizontal zero line so 
that one can more readily see whether the magnitude of the residuals (irrespective of sign) 
is changing with the level of X or Y. 

Figure 3.5b contains a plot of the absolute residuals against age for the blood pressuré 
example. This plot shows more clearly that the residuals tend to be larger in absolute 
magnitude for older-aged women. 


(a) Residual Plot against X (b) Absolute Residual Plot against X 
20 . 20 


ме" 
л 


— 
© 


Residual 
Absolute residual 


Un 


108 PartOne Simple Linear Regression 


FIGURE 3.6 e 
Residual Plot JMSE 
with Outlier. 


чы Юю — © э мио л с 


Semistudentized Residual 


An 


Presence of Outliers 


Outliers are extreme observations. Residual outliers can be identified from residual plots 
against X or Ў, as well as from box plots, stem-and-leaf plots, and dot plots of the residu- 
als. Plotting of semistudentized residuals is particularly helpful for distinguishing outlying 
observations, since it then becomes easy to identify residuals that lie many standard devi- 
ations from zero. A rough rule of thumb when the number of cases is large is to consider 
semistudentized residuals with absolute value of four or more to be outliers. We shall take 
up more refined procedures for identifying outliers in Chapter 10. 

The residual plot in Figure 3.6 presents semistudentized residuals and contains one 
outlier, which is circled. Note that this residual represents an observation almost six standard 
deviations from the fitted value. 

Outliers can create great difficulty. When we encounter one, our first suspicion is that 
the observation resulted from a mistake or other extraneous effect, and hence should be 
discarded. A major reason for discarding it is that under the least squares method, a fitted 
line may be pulled disproportionately toward an outlying observation because the sum of 
the squared deviations is minimized. This could cause a misleading fit if indeed the outlying 
observation resulted from a mistake or other extraneous cause. On the other hand, outliers 
may convey significant information, as when an outlier occurs because of an interaction 
with another predictor variable omitted from the model. A safe rule frequently suggested is 
to discard an outlier only if there is direct evidence that it represents an error in recording, 
a miscalculation, a malfunctioning of equipment, or a similar type of circumstance. 


Comment 


When a linear regression model is fitted to a data set with a small number of cases and an outlier is 
present, the fitted regression can be so distorted by the outlier that the residual plot may improperly 
suggest a lack of fit of the linear regression model, in addition to flagging the outlier. Figure 3.7 
illustrates this situation. The scatter plot in Figure 3.7a presents a situation where all observations 
except the outlier fall around a straight-line statistical relationship. When a linear regression function 
is fitted to these data, the outlier causes such a shift in the fitted regression line as to lead to a systematic 
pattern of deviations from the fitted line for the other observations, suggesting a lack of fit of the linear 
regression function. This is shown by the residual plot in Figure 3.7b. " 


Nonindependence of Error Terms 


Whenever data are obtained in a time sequence or some other type of sequence, such as 
for adjacent geographic areas, it is a good idea to prepare a sequence plot of the residuals. 


Chapter 3 Diagnostics and Remedial Measures 109 


FIGURE 3.7 (a) Scatter Plot (b) Residual Plot 
astorting 
id on 
Residuals 
Caused by an 
Outlier When 
Remaining 
Data Follow 
Linear 
Regression. 


Residual 


FIGURE 3.8 Residual Time Sequence Plots Illustrating Nonindependence of Error Terms. 
(a) Welding Example Trend Effect (b) Cyclical Nonindependence 


Residual 
Residual 


1 3 5 7 9 11 13 
Time Order of Weld Time Order 


The purpose of plotting the residuals against time or in some other type of sequence is to 
see if there is any correlation between error terms that are near each other in the sequence. 
Figure 3.8a contains a time sequence plot of the residuals in an experiment to study the 
relation between the diameter of a weld (X) and the shear strength of the weld (Y): An 
evident correlation between the error terms stands out. Negative residuals are associated 
mainly with the early trials, and positive residuals with the later trials. Apparently, some 
effect connected with time was present, such as learning by the welder or a gradual change 
in the welding equipment, so the shear strength tended to be greater in the later welds 
because of this effect. + К 

A prototype residual plot showing a time-related trend effect is presented in Figure 3.4d, 
which portrays a linear time-related trend effect, as in the welding example. It is sometimes 
useful to view the problem of nonindependence of the error terms as one in which an 
important variable (in this case, time) has been omitted from the model. We shall discuss 
this type of problem shortly. 


110 PartOne Simple Linear Regression 


Another type of nonindependence of the error terms is illustrated in Figure 3.8b. Here 
the adjacent error terms are also related, but the resulting pattern is a cyclical one with no 
trend effect present. 

When the error terms are independent, we expect the residuals in a sequence plot to 
fluctuate in a more or less random pattern around the base line 0, such as the scattering 
shown in Figure 3.2b for the Toluca Company example. Lack of randomness can take the 
form of too much or too little alternation of points around the zero line. In practice, there is 
little concern with the former because it does not arise frequently. Too little alternation, in 
contrast, frequently occurs, as in the welding example in Figure 3.8a. 


PS 


Comment 

When the residuals are plotted against X, as in Figure 3.3b for the transit example, the scatter may not 
appear to be random. For this plot, however, the basic problem is probably not lack of independence 
of the error terms but a poorly fitting regression function. This, indeed, is the situation portrayed in 
the scatter plot in Figure 3.3a. " 


Nonnormality of Error Terms 
As we noted earlier, small departures from normality do not create any serious problems. 
Major departures, on the other hand, should be of concern. The normality of the error terms 
can be studied informally by examining the residuals in a variety of graphic ways. 


Distribution Plots. A box plot of the residuals is helpful for obtaining summary informa- 
tion about the symmetry of tbe residuals and about possible outliers. Figure 3.2c contains 
a box plot of the residuals in the Toluca Company example. No serious departures from 
symmetry are suggested by this plot. A histogram, dot plot, or stem-and-leaf plot of the 
residuals can also be helpful for detecting gross departures from normality. However, the 
number of cases in the regression study must be reasonably large for any of these plots to 
convey reliable information about the shape of the distribution of the error terms. 


Comparison of Frequencies. Another possibility when the number of cases is reasonably 
large is to compare actual frequencies of the residuals against expected frequencies under 
normality. For example, one can determine whether, say, about 68 percent of the residuals 
е; fall between +y MSE or about 90 percent fall between +1.645V MSE. When the sample 
size is moderately large, corresponding t values may be used for the comparison. 

Toillustrate this procedure, we again consider ће Toluca Company example of Chapter 1. 
Table 3.2, column 1, repeats the residuals from Table 1.2. We see from Figure 2.2 that 
MSE = 48.82. Using the 7 distribution, we expect under normality about 90 percent of 
the residuals to fall between :Е1(.95; 23)./MSE = +1.714(48.82), or between —83.68 
and 83.68. Actually, 22 residuals, or 88 percent, fall within these limits. Similarly, under 
normality, we expect about 60 percent of the residuals to fall between —41.89 and 41.89. 
The actual percentage here is 52 percent. Thus, the actual frequencies here are reasonably 
consistent with those expected under normality. 


Normal Probability Plot. Still another possibility is to prepare a normal probability plot 
of the residuals. Here each residual is plotted against its expected value under normality. 
A plot that is nearly linear suggests agreement with normality, whereas a plot that departs 
substantially from linearity suggests that the error distribution is not normal. 

Table 3.2, column 1, contains the residuals for the Toluca Company example. To find 
the expected values of the ordered residuals under normality, we utilize the facts that (1) 


TABLE 3.2 
Residuals and 
Expected 
Values under 
Normality— 
Toluca 
Company 
Example. 


Chapter 3 Diagnostics and Remedial Measures 111 


(1) (2) (3) 
Expected 
Run Residual Rank Value under 
i е k Normality 
1 51.02 22 51.95 
2. ~48.47 5 —44.10 
3 —19.88 10 —14.76 
23 38.83 19 31.05 
24 —5.98 13 0 
25 10.72 17 19.93 


the expected value of the error terms for regression model (2.1) is zero and (2) the standard 
deviation of the error terms is estimated by ./MSE. Statistical theory has shown that for a 
normal random variable with mean 0 and estimated standard deviation ~ MSE, a good ap- 
proximation of the expected value of the kth smallest observation in a random sample of л is: 


к — .375 
MSE «( 11 25 ) (3.6) 
where z (A) as usual denotes the (A)100 percentile of the standard normal distribution. 

Using this approximation, let us calculate the expected values of the residuals under 
normality for the Toluca Company example. Column 2 of Table 3.2 shows the ranks of 
the residuals, with the smallest residual being assigned rank 1. We see that the rank of the 
residual for run 1, e; = 51.02, is 22, which indicates that this residual is the 22nd smallest 
among the 25 residuals. Hence, for this residual k — 22. We found earlier (Table 2.1) that 
MSE — 2,384. Hence: 


k—.375 22—.375 21.625 
= = = .8564 
n+ .25 25 + 25 25.25 
so that the expected value of this residual under normality is: 


/2,384[z(.8564)] = \/2,384(1.064) = 51.95 


Similarly, the expected value of the residual for run 2, e; = —48.47, is obtained by noting 
that the rank of this residual is k = 5; in other words, this residual is the fifth smallest one 
among the 25 residuals. Hence, we require (k — .375)/ (n 4- .25) = (5— .375)/(25 4.25) = 
.1832, so that the expected value of this residual under normality is: 


V2,384[z(.1832)] = 4/2,384(—.9032) = —44.10 


Table 3.2, column 3, contains the expected values under the assumption of normality 
for a portion of the 25 residuals. Figure 3.2d presents a plot of the residuals against their 
expected values under normality. Note that the points in Figure 3.2d fall reasonably close to 
a straight line, suggesting that the distribution of the error terms does not depart substantially 
from a normal distribution. 

Figure 3.9 shows three normal probability plots when the distribution of the error terms 
departs substantially from normality. Figure 3.9a shows a normal probability plot when 
the error term distribution is highly skewed to the right. Note the concave-upward shape 


112 Part Опе Simple Linear Regression 


FIGURE 3.9 Normal Probability Plots when Error Term Distribution Is Not Normal. 
(a) Skewed Right (b) Skewed Left (с) Symmetrical with Heavy Tails 


Residual 
Residual 
Residual 


-3 ~2 -1 0 1 2 3 -3 
Expected Expected Expected 


of the plot. Figure 3.9b shows a normal probability plot when the error term distribution 
is highly skewed to the left. Here, the pattern is concave downward. Finally, Figure 3.9c 
shows a normal probability plot when the distribution of the error terms is symmetrical but 
has heavy tails; in other words, the distribution has higher probabilities in the tails than 
a normal distribution. Note the concave-downward curvature in the plot at the left end, 
corresponding to the plot for a left-skewed distribution, and the concave-upward plot at the 
right end, corresponding to a right-skewed distribution. 


Comments 


1. Many computer packages will prepare normal probability plots, either automatically or at the 
option of the user. Some of these plots utilize semistudentized residuals, others omit the factor y MSE 
in (3.6), but neither of these variations affect the nature of the plot. 

2. For continuous data, ties among the residuals should occur only rarely. If two residuals do have 
the same value, a simple procedure is to use the average rank for the tied residuals for calculating the 
corresponding expected values. и 


Difficulties in Assessing Normality. Тһе analysis for model departures with respect to 
normality is, in many respects, more difficult than that for other types of departures. In the 
first place, random variation can be particularly mischievous when studying the nature of 
a probability distribution unless the sample size is quite large. Even worse, other types of 
departures can and do affect the distribution of the residuals. For instance, residuals may 
appearto be not normally distributed because an inappropriate regression function is used or 
because the error variance is not constant. Hence, it is usually a good strategy to investigate 


these other types of departures first, before concerning oneself with the normality of the 
error terms. 


Omission of Important Predictor Variables 


Residuals should also be plotted against variables omitted from the model that might have 
important effects on the response. The time variable cited earlier in the welding example is 


FIGURE 3.10 
Residual Plots 
for Possible 
Omission of 
Important 
Predictor 
Variable— 
Productivity 
Example. 


Chapter З Diagnostics and Remedial Measures 113 


(a) Both Machines 


Residual 


Age of Worker X 


(b) Company A Machines 


Residual 


Age of Worker x 


(c) Company B Machines 


Residual 


Age of Worker 


an illustration. The purpose of this additional analysis is to determine whether there are 
any other key variables that could provide important additional descriptive and predictive 
power to the model. 

As another example, in a study to predict output by piece-rate workers in an assembling 
operation, the relation between output (Y) and age (X) of worker was studied for a sample 
of employees. The plot of the residuals against X, shown in Figure 3.10a, indicates no 
ground for suspecting the appropriateness of the linearity of the regression function or the 
constancy of the error variance. Since machines produced by two companies (A and B) are 
used in the assembling operation and could have an effect on output, residual plots against 
X by type of machine were undertaken and are shown in Figures 3.10b and 3.10c. Note 
that the residuals for Company A machines tend to be positive, while those for Company B 
machines tend to be negative. Thus, type of machine appears to have a definite effect on 
productivity, and output predictions may turn out to be far superior when this variable is 
added to the model. 


114 Part Опе Simple Linear Regression 


While this second example dealt with a qualitative variable (type of machine), the resid- 
ual analysis for an additional quantitative variable is analogous. 'The residuals are plotted 
against the additional predictor variable to see whether or not the residuals tend to vary 
systematically with the level of the additional predictor variable. 


Comment 

We do not say that the original model is *wrong" when it can be improved materially by adding one or 
more predictor variables. Only a few of the factors operating on any response variable Y in real-world 
situations can be included explicitly in a regression model. The chief purpose of residual analysis in 
identifying other important predictor variables is therefore to test the adequacy of the model and see 
whether it could be improved materially by adding one or more predictor variables. a 


Some Final Comments e di 


1. We discussed model departures one at a time. In actuality, several types of departures 
may occur together. For instance, a linear regression function may be a poor fit and the 
variance of the error terms may not be constant. In these cases, the prototype patterns of 
Figure 3.4 can still be useful, but they would need to be combined into composite patterns. 

2. Although graphic analysis of residuals is only an informal method of analysis, in 
many cases it suffices for examining the aptness of a model. 

3. The basic approach to residual analysis explained here applies not only to simple 
linear regression but also to more complex regression and other types of statistical models. 

4. Severaltypes of departures from the simple linear regression model have been identi- 
fied by diagnostic tests of the residuals. Model misspecification due to either nonlinearity or 
the omission of important predictor variables tends to be serious, leading to biased estimates 
of the regression parameters and error variance. These problems are discussed further in 
Section 3.9 and Chapter 10. Nonconstancy of error variance tends to be less serious, leading 
to less efficient estimates and invalid error variance estimates. The problem is discussed in 
depth in Section 11.1. The presence of outliers can be serious for smaller data sets when 
their influence is large. Influential outliers are discussed further in Section 10.4. Finally, the 
nonindependence of error terms results in estimators that are unbiased but whose variances 
are seriously biased. Alternative estimation methods for correlated errors are discussed in 
Chapter 12. 


3.4 Overview of Tests Involving Residuals 


Graphic analysis of residuals is inherently subjective. Nevertheless, subjective analysis of a 
variety of interrelated residual plots will frequently reveal difficulties with the model more 
clearly than particular formal tests. There are occasions, however, when one wishes to put 
specific questions to a test. We now briefly review some of the relevant tests. 

Most statistical tests require independent observations. As we have seen, however, the 
residuals are dependent. Fortunately, the dependencies become quite small for large samples, 
so that one can usually then ignore them. 


Tests for Randomness 


A runs test is frequently used to test for lack of randomness in the residuals arranged in time 
order. Another test, specifically designed for lack of randomness in least squates residuals, 
is the Durbin-Watson test. This test is discussed in Chapter 12. 


Chapter 3 Diagnostics and Remedial Measures 115 


Tests for Constancy of Variance 


When a residual plot gives the impression that the variance may be increasing or decreasing 
in a systematic manner related to X or E{Y}, a simple test is based on the rank correlation 
between the absolute values of the residuals and the corresponding values of the predictor 
variable. Two other simple tests for constancy of the error variance—the Brown-Forsythe 
test and the Breusch-Pagan test—are discussed in Section 3.6. 


Tests for Outliers 


A simple test for identifying an outlier observation involves fitting a new regression line to 
the other n — 1 observations. The suspect observation, which was not used in fitting the new 
line, can now be regarded as a new observation. One can calculate the probability that in п 
observations, a deviation from the fitted line as great as that of the outlier will be obtained 
by chance. If this probability is sufficiently small, the outlier can be rejected as not having 
come from the same population as the other n — 1 observations. Otherwise, the outlienis 
retained. We discuss this approach in detail in Chapter 10. 

Many other tests to aid in evaluating outliers have been developed. These are discussed 
in specialized references, such as Reference 3.1. 


Tests for Normality 


Goodness of fit tests can be used for examining the normality of the error terms. For instance, 
the chi-square test or the Kolmogorov-Smirnov test and its modification, the Lilliefors test, 
can be employed for testing the normality of the error terms by analyzing the residuals. 
A simple test based on the normal probability plot of the residuals will be taken up in 
Section 3.5. 


Comment 
The runs test, rank correlation, and goodness of fit tests are commonly used statistical procedures and 
are discussed in many basic statistics texts. п 


3.0 Correlation Test for Normality 


Example 


In addition to visually assessing the approximate linearity of the points plotted in a nor- 
mal probability plot, a formal test for normality of the error terms can be conducted by 
calculating the coefficient of correlation (2.74) between the residuals e; and their expected 
values under normality. A high value of the correlation coefficient is indicative of normality. 
Table B.6, prepared by Looney and Gulledge (Ref. 3.2), contains critical values (percentiles) 
for various sample sizes for the distribution of the coefficient of correlation between the 
ordered residuals and their expected values under normality when the error terms are nor- 
mally distributed. If the observed coefficient of correlation is at least as large as the tabled 
value, for a given a level, one can conclude that the error terms are reasonably normally 
distributed. 


For the Toluca Company example in Table 3.2, the coefficient of correlation between the 
ordered residuals and their expected values under normality is .991. Controlling the o risk 
at .05, we find from Table B.6 that the critical value for п = 25 is .959. Since the observed 
coefficient exceeds this level, we have support for our earlier conclusion that the distribution 
of the error terms does not depart substantially from a normal distribution. 


116 PartOne Simple Linear Regression 


Comment 


The correlation test for normality presented here is simpler than the Shapiro-Wilk test (Ref. 3.3), 
which can be viewed as being based approximately also on the coefficient of correlation between the 
ordered residuals and their expected values under normality. B 


3.6 ‘Tests for Constancy of Error Variance 


We present two formal tests for ascertaining whether the error terms have constant variance: 
the Brown-Forsythe test and the Breusch-Pagan test. 


Brown-Forsythe Test ae 
The Brown-Forsythe test, a modification of the Levene test (Ref. 3.4), does not depend 
on normality of the error terms. Indeed, this test is robust against serious departures from 
normality, in the sense that the nominal significance level remains approximately correct 
when the error terms have equal variances even if the distribution of the error terms is 
far from normal. Yet the test is still relatively efficient when the error terms are normally 
distributed. The Brown-Forsythe test as described is applicable to simple linear regression 
when the variance of the error terms either increases or decreases with X, as illustrated in 
the prototype megaphone plot in Figure 3.4c. The sample size needs to be large enough so 
that the dependencies among the residuals can be ignored. 

The test is based on the variability of the residuals. The larger the error variance, the 
larger the variability of the residuals will tend to be. To conduct the Brown-Forsythe test, we 
divide the data set into two groups, according to the level of X, so that one group consists 
of cases where the X level is comparatively low and the other group consists of cases where 
the X level is comparatively high. If the error variance is either increasing or decreasing 
with X, the residuals in one group will tend to be more variable than those in the other 
group. Equivalently, the absolute deviations of the residuals around their group mean will 
tend to be larger for one group than for the other group. In order to make the test more 
robust, we utilize the absolute deviations of the residuals around the median for the group 
(Ref. 3.5). The Brown-Forsythe test then consists simply of the two-sample т test based on 
test statistic (A.67) to determine whether the mean of the absolute deviations for one group 
differs significantly from the mean absolute deviation for the second group. 

Although the distribution of the absolute deviations of the residuals is usually not normal, 
it has been shown that the т* test statistic still follows approximately е т distribution when 
the variance of the error terms is constant and the sample sizes of the two groups are not 
extremely small. 

We shall now use ej; to denote the ith residual for group 1 and ej; to denote the ith 
residual for group 2. Also we shall use n; and n; to denote the sample sizes of the two 
groups, where: 


n=n+n2 (3.7) 


Further, we shall use e; and e; to denote the medians of the residuals in the two groups. 
The Brown-Forsythe test uses the absolute deviations of the residuals around their group 
median, to be denoted by d; and 4: 


d; = lei — ё1] di2 = lei — ё] (3.8) 


Example 


TABLE 3.3 
Calculations 
for Brown- 
Forsythe Test 
for Constancy 
of Error 
Variance— 
Toluca 
Company 
Example. 


Chapter3 Diagnostics and Remedial Measures 117 


With this notation, the two-sample т test statistic (A.67) becomes: 


"ы ы (3.9) 


where d, and d; are the sample means of the d; and d;;, respectively, and the pooled variance 
s? in (A.63) becomes: 
FN Уа. — d + Уо - 4)? 
n—2 
We denote the test statistic for the Brown-Forsythe test by тру. 
If the error terms have constant variance and n, and п» are not extremely small, 15, 


follows approximately the г distribution with n — 2 degrees of freedom. Large absolute 
values of гё p indicate that the error terms do not have constant variance. n 


(3.9a) 


We wish to use the Brown-Forsythe test for the Toluca Company example to determine 
whether or not the error term variance varies with the level of X. Since the X levels are 
spread fairly uniformly (see Figure 3.1a), we divide the 25 cases into two groups with 
approximately equal X ranges. The first group consists of the 13 runs with lot sizes from 
20 to 70. The second group consists of the 12 runs with lot sizes from 80 to 120. Table 3.3 


Group 1 | uu 
(1) (2) By | 
Lot Residual | " 
i Run Size ел di (di — Ф)? 
1 14 20 20.77 :89 1,929.41 
2 2 30 48.47 28.59 263.25 
12 12 70 —60.28 40.40 19.49 
13 25 70 10.72. 30:60 202.07 
Total . 582.60 12,566.6 
ё ——19,8 — di = 44815 
Group 2 
(1) Q ои Ø 
Eot Residual 
i Run Size .€i2 t dz (dz = dj? 
1 1 80 31,02 53.70 -637.56 
2 8 80 4.02 6.70 473.06 
11 20 110 —34.09- 3i41 ^ 876 
12 7 120 55.21 Д 866.71 
Total 5610.2 — 


8 Part One Simple Linear Regression 


presents a portion of the data for each group. In columns 1 and 2 are repeated the lot sizes 
and residuals from Table 1.2. We see from Table 3.3 that the median residual is č; = —19.88 
for group 1 and 2: = —2.68 for group 2. Column 3 contains the absolute deviations of the 
residuals around their respective group medians. For instance, we obtain: 
dy = lei — ё = | — 20.77 — (—19.88)| = .89 
di; = |e12 — ё) = 151.02 — (—2.68)| = 53.70 
The means of the absolute deviations are obtained in the usual fashion: 
. 582.60 > 34140 
pee um 44.815 d = = 28.450 
Pd 


Finally, column 4 contains the squares of the deviations of the d;, and d;; around their 
respective group means. For instance, we have: 


(di, — diy = (.89 — 44.815)? = 1,929.41 
(di; — Ф)? = (53.70 — 28.450)? = 637.56 


We are now ready to calculate test statistic (3.9): 


,  12,566.6 + 9,610.2 
dm 255-2 

s = 31.05 
. _ 44.815 — 28.450 


ЇВр == p 
1 1 

31. 54/24 I 

0 1з 12 


To control the а risk at .05, we require т(.975; 23) = 2.069. The decision rule therefore is: 


— 964.21 


= 1.32 


If |75 | < 2.069, conclude the error variance is constant 


If |трь| > 2.069, conclude the error variance is not constant 


Since |15 | = 1.32 < 2.069, we conclude that the error variance is constant and does not 
vary with the level of X. The two-sided P-value of this test is .20. 


Comments 

1. If the data set contains many cases, the two-sample t test for constancy of error variance can 
be conducted after dividing the cases into three or four groups, according to the level of X, and using 
the two extreme groups. 

2. А robust test for constancy of the error variance is desirable because nonnormality and lack of 
constant variance often go hand in hand. For example, the distribution of the error terms may become 
increasingly skewed and hence more variable with increasing levels of X. " 


Breusch-Pagan Test 
A second test for the constancy of the error variance is the Breusch-Pagan test (Ref. 3.6). 
This test, a large-sample test, assumes that the error terms are independent and normally 
distributed and that the variance of the error term ¢;, denoted by оў, is related to the level 


Chapter 3 Diagnostics and Remedial Measures 119 


of X in the following way: 
log, o; = yo + УХ; (3.10) 


Note that (3.10) implies that o? either increases or decreases with the level of X, depending 
on the sign of yı. Constancy of error variance corresponds to y, = 0. The test of Ho: y, = 0 
versus На: y, Æ Ois carried out by means of regressing the squared residuals e? against X; 
in the usual manner and obtaining the regression sum of squares, to be denoted by SSR*. 
The test statistic X2» is as follows: 


SSR* SSE\? 
Xi,— -[— .11 
= Ss (=) (1) 


where SSR* is the regression sum of squares when regressing e? on X and SSE is the error 
sum of squares when regressing Y on X. If Но: yı = O0 holds and n is reasonably large, 
X2, follows approximately the chi-square distribution with one degree of freedom. Large 
values of X2, lead to conclusion H4, that the error variance is not constant. 


To conduct the Breusch-Pagan test for the Toluca Company example, we regress the squared 
residuals in Table 1.2, column 5, against X and obtain SSR* — 7,896,128. We know from 
Figure 2.2 that SSE — 54,825. Hence, test statistic (3.11) is: 


2 
77,896,128 = 54,825 — 821 
2 25 

To control the o risk at .05, we require x?(.95; 1) = 3.84. Since X2, = .821 < 3.84, we 


conclude Ho, that the error variance is constant. The P-value of this test is .64 so that the 
data are quite consistent with constancy of the error variance. 


Example 


@ = 
Хвр = 


Comments 

1. The Breusch-Pagan test can be modified to allow for different relationships between the error 
variance and the level of X than the one in (3.10). 

2. Test statistic (3.11) was developed independently by Cook and Weisberg (Ref. 3.7), and the test is 
sometimes referred to as the Cook-Weisberg test. " 


3.7 Е Test for Lack of Fit 


We next take up a formal test for determining whether a specific type of regression function 
adequately fits the data. We illustrate this test for ascertaining whether a linear regression 
function is a good fit forthe data. — ; 


~ 


Assumptions 


The lack of fit test assumes that the observations Y for given X are (1) independent and 
(2) normally distributed, and that (3) the distributions of Y-have the same variance c?. 
The lack of fit test requires repeat, observations at one or more X levels. In nonexperi- 
mental data, these may occur fortuitously, as when in a productivity study relating workers’ 
output and age, several workers of the same age happen to be included in the study. In an 
experiment, one can assure by design that there are repeat observations. For instance, in an 


120 Part Опе Simple Linear Regression 


Example 


TABLE 3.4 
Data and 
Analysis of 
Variance 
Table—Bank 
Example. 


experiment on the effect of size of salesperson bonus on sales, three salespersons can be 
offered a particular size of bonus, for each of six bonus sizes, and their sales then observed, 

Repeat trials for the same level of the predictor variable, of the type described, are called 
replications. The resulting observations are called replicates. 


In an experiment involving 12 similar but scattered suburban branch offices of a commercial 
bank, holders of checking accounts at the offices were offered gifts for setting up money 
market accounts. Minimum initial deposits in the new money market account were specified 
to qualify for the gift. The value of the gift was directly proportional to the specified 
minimum deposit. Various levels of minimum deposit and related gift values were used in 
the experiment in order to ascertain the relation between the specified minimum deposit 
and gift value, on the one hand, and number of accounts opened at the office, onthe other. 
Altogether, six levels of minimum deposit and proportional gift value were used, with two 
of the branch offices assigned at random to each level. One branch office had a fire during 
the period and was dropped from the study. Table 3.4a contains the results, where X is the 
amount of minimum deposit and Y is the number of new money market accounts that were 
opened and qualified for the gift during the test period. 
A linear regression function was fitted in the usual fashion; it is: 


Y = 50.72251 + .48670X 


The analysis of variance table also was obtained and is shown in Table 3.4b. A scatter plot, 
together with the fitted regression line, is shown in Figure 3.11. The indications are strong 
that a linear regression function is inappropriate. To test this formally, we shall use the 
general linear test approach described in Section 2.8. 


(a) Data 
Size of Size of 
Minimum Number Minimum Number 
Deposit of New Deposit of New 
Branch (dollars) Accounts Branch (dollars) Accounts 
i X i Y; i Xi Y; 
125 160 7 75 42 
2 100 112 8 175 124 
3 200 124 9 125 150 
4 75 28 10 200 104 
5 150 152 11 100 136 
6 175 156 
(b) ANOVA Table 
Source of 
Variation SS df MS 
Regression 5,141.3 1 5,141.3 
Error 14,741.6 9 1,638.0 


Total 19,882.9 10 


FIGURE 3.11 
Scatter Plot 
and Fitted 
Regression 
Line—Bank 
Example. 


TABLE 3.5 
Data Arranged 
by Replicate 
Number and 
Minimum 
Deposit —Bank 
Example. 


Notation 


Full Model 


Chapter 3 Diagnostics and Remedial Measures 121 


Number of New Accounts 


150 
Size of Minimum Deposit 


50 100 200 


Size of Minimum Deposit (dollars) 


j=1 j=2 j23 j=4 і= 5 ј=6 ү 
Replicate Ху = 75 Хә = 100 Хз = 125 Ха= 150 Х5 = 175 Хе = 200 
j= 28 112 160 152 156 124 
j=2 42 136 150: 124. 104 
* Mean Y; 35 124 155 152 140 114 


First, we need to modify our notation to recognize the existence of replications at some levels 
of X. Table 3.5 presents the same data as Table 3.4a, but in an arrangement that recognizes 
the replicates. We shall denote the different X levels in the study, whether or not replicated 
observations are present, as X,,..., Xe. For the bank example, c = 6 since there are six 
minimum deposit size levels in the study, for five of which Шеге are two observations and 
for one there is a single observation. We shalllet X, — 75 (the smallest minimum deposit 
level), X; = 100, ..., Х = 200. Further, we shall denote the number of replicates for the 
jth level of X as n;; for our example, n, = n; = пз = ns = ng = 2 and n4 = 1. Thus, the 
total number of observations n is given by: 


с 
п = J п} 
j=l 


We shall denote the observed value of the response variable for the ith replicate for 
the jth level of X by Y;j, where i = 1,..., nj, j = 1,..., c. For the bank example 
(Table 3.5), Yi, = 28, ҮР = 42, Yi; = 112, and so on. Finally, we shall denote the 
mean of the Y observations at the level X —'X; by Y;. Thus, Y, = (28 + 42)/2 = 35 and 
Y, — 152/1 — 152. E 


(3.12) 


The general linear test approach begins with the specification of the full model. The full 
model used for the lack of fit test makes the same assumptions as the simple linear regression 
model (2.1) except for assuming a linear regression relation, the subject of the test. This 
full model is: 


Y; = ш; +e; Full model (3.13) 


2 PartOne Simple Linear Regression 


where: 


шу are parameters j = 1,..., с 
&;; are independent N (0, с?) 


Since the error terms have expectation zero, it follows that: 
E(Y;j) = и; (3.14) 


Thus, the parameter u; (j = 1,..., с) is the mean response when X = Х,. 

The full model (3.13) is like the regression model (2.1) in stating that each response 
Y is made up of two components: the mean response when X = X; and a random error 
term. The difference between the two models is that in the full model (3.13) there are no 
restrictions on the means u ;, whereas in the regression model (2. 1) the mean responses аге 
linearly related to X (i.e., E{Y} = Bo + В.Х). 

To fit the full model to the data, we require the least squares or maximum likelihood 
estimators for the parameters u j. It can be shown that these estimators of uj are simply the 
sample means Y;: 


йу = (3.15) 


Thus, фе estimated expected value for observation Y;; is Y;, and the error sum of squares 
for the full model therefore is: 


SSE(F) = X Y (Y; — Y; y! = SSPE (3.16) 
j i 


In the context of the test for lack of fit, the full model error sum of squares (3.16) is called 
the pure error sum of squares and is denoted by SSPE. 

Note that SSPE is made up of the sums of squared deviations at each X level. At level 
X = X, this sum of squared deviations is: 


So; — Yi (3.17) 
These sums of squares are then added over all of the X levels (j = L, ..., с). For ће bank 
example, we have: 
SSPE = (28 — 35)? + (42 — 35)* + (112 — 124)? + (136 — 124)? + (160 — 155)? 
+ (150 — 155)? + (152 — 152)? + (156 ~ 140)” + (124 — 140)? 
+ (124 — 114)? + (104 — 114)? 
= 1,148 


Note that any X level with no replications makes no contribution to SSPE because Y; zY, 
then. Thus, (152 — 152)? = 0 for j — 4 in the bank example. 

The degrees of freedom associated with SSPE can be obtained by recognizing that the 
sum of squared deviations (3.17) at a given level of X is like an ordinary total sum of squares 
based on n observations, which has n — 1 degrees of freedom associated with it. Here, there 
аге n; observations when X = X; hence the degrees of freedom are n; — 1. Just as SSPE 
is the sum of the sums of squares (3.17), so the number of degrees of freedom associated 


Chapter 3 Diagnostics and Remedial Measures 123 


with SSPE is the sum of the component degrees of freedom: 


dfe = (nj) = Уп; -с=п-с (3.18) 
ј j 


For the bank example, we have dfr = 11 — 6 = 5. Note that any X level with no replications 
makes no contribution to dfr because n; — 1 = 1 — 1 = 0 then, just as such an X level 
makes no contribution to SSPE. 


Reduced Model 
The general linear test approach next requires consideration of the reduced model under 
Ho. For testing the appropriateness of a linear regression relation, the alternatives are: 


Ho: E{Y} = Bo + В.Х 


Ha: E{Y} В+ В.Х eo 
Thus, Ho postulates that u; in the full model (3.13) is linearly related to X у: К 
шу = Во + В.Х; 
The reduced model under Но therefore is: 
Yi; = Bo + В.Х; + еу Reduced model (3.20) 


Note that the reduced model is the ordinary simple linear regression model (2.1), with the 
subscripts modified to recognize the existence of replications. We know that the estimated 
expected value for observation Y;; with regression model (2.1) is the fitted value Y;;: 


Y, = bo + biX; (3.21) 


Hence, the error sum of squares for the reduced model is the usual error sum of squares SSE: 


SSE(R) = Y^ X 0; — (bo + X) 


= M m — fy" = SSE (3.22) 
We also know that the degrees of freedom associated with SSE(R) are: 
dfr =n-2 


For the bank example, we have from Table 3.4b: 
SSE(R) = SSE = 14,741.6 
dfp = 9 
Test Statistic " | 
The general linear test statistic (2.70): 
pe — SSE(R) — SSE(F) | SSE(F) 
йк — dfr ^od - 


here becomes: 
SSE—SSPE | SSPE 


^ (n—2)-(n-c) п-се 


* 


(3.23) 


24 PartOne Simple Linear Regression 


The difference between the two error sums of squares is called the lack of fit sum of squares 


here and is denoted by SSLF: 
SSLF — SSE — SSPE (3.24) 
We can then express the test statistic as follows: 
.  9SLF $5РЕ 
© с—2°п-—-с 
MSLF 
II AER 3.25 
MSPE ( ) 
where MSLF denotes the lack of fit mean square and MSPE denotes the pure error mean 
at 


square. 
We know that large values of F* lead to conclusion H; in ће general linear test. Decision 


rule (2.71) here becomes: 
If F* < F(1—0;c—2,n — c), conclude Но 
А (3.26) 
If F* > Е(1– о;с —– 2, п — с), conclude H, 


For the bank example, the test statistic can be constructed easily from our earlier results: 
SSPE — 1,148.0 п-с=11—6=5 
SSE = 14,741.6 
SSLF — 14,741.6 — 1,148.0 — 13,593.6 с—-2=6—2=4 
_ 13,593.6 n 1,148.0 


F* 
4 5 
3,398.4 
= eg = 1480 


If the level of significance is to be a = .01, we require F(.99;4, 5) = 11.4. Since 
F* = 14.80 > 11.4, we conclude H,, that the regression function is not linear. This, of 
course, accords with our visual impression from Figure 3.11. The P-value for the test is 
.006. 


ANOVA Table 


The definition of the lack of fit sum of squares SSLF in (3.24) indicates that we have, in 
fact, decomposed the error sum of squares SSE into two components: 


SSE = SSPE + SSLF (3.27) 
This decomposition follows from the identity: 
Ү,; == f; = ИЛ = Y; + Y; = Y (3.28) 
Error Pure error Lack of fit 
deviation deviation deviation 


This identity shows that the error deviations in SSE are made up of a pure error component 
and a lack of fit component. Figure 3.12 illustrates this partitioning for the case Үз = 160, 
Хз = 125 in the bank example. 


FIGURE 3.12 
[illustration of 
Decomposition 
of Error 
Deviation 

Y; — Yu— 
Bank 
Example. 


Chapter 3 Diagnostics and Remedial Measures 125 


Y 
13 = 160 
160 Peas + 
ure error deviation) 5 = Y44 — Y. ER 
Я (р ) a3 — Yad V; 155 
t 
p> 
о 
E : 
z (lack of fit deviation) 43 = Y; — fj, Үз — з = 48 (error deviation) 
Š 130 
© 
© 
Q 
E 
he i 
z 
100 
Y = 50.72251 + .48670X 
уй — Lo SS = 
0 75 100 125 150 X k 


Size of Minimum Deposit (dollars) 


When (3.28) is squared and summed over all observations, we obtain (3.27) since the 
cross-product sum equals zero: 


Sodomy - Ej - BY + УУ - ta” (3.29) 


SSE = SSPE + SSLF 


Note from (3.29) that we can define the lack of fit sum of squares directly as follows: 


SSLF = M V Ë; - Р)? (3.30) 


Since all Y;; observations at the level X; have the same fitted value, which we can denote 
by Y;, we can express (3.30) equivalently as: 


SSLF — n;(Y, — Ў) (3.30a) 
jMj j 
j 


Formula (3.302) indicates clearly why SSLF measures lack of fit. If the linear regression 
function is appropriate, then the means Y; will be near the fitted values f, calculated from 
the estimated linear regression function ама SSLF will be small. On the other hand, if the 
linear regression function is not appropriate, the means Y; will not be near the fitted values 
calculated from the estimated linear regression function, as in Figure 3.11 for the bank 
example, and SSLF will be large. 

Formula (3.302) also indicates why c — 2 degrees of freedom are associated with SSLF. 
There are c means Ў in the sum of squares, and two degrees of freedom are lost in estimating 
the parameters Во and f, of the linear regression function to obtain the fitted values f. 

An ANOVA table can be constructed for the decomposition of SSE. Table 3.6a contains 
the general ANOVA table, including the decomposition of SSE just explained and the 
mean squares of interest, and Table 3.6b contains the ANOVA decomposition for the bank 
example. 


126 PartOne Simple Linear Regression 


TABLE 3.6 
General 
ANOVA Table 
for Testing 
Lack of Fit of 
Simple Linear 
Regression 
Function and 
ANOVA 
Table—Bank 
Example. 


v (a) General | 
Source of 2 
Variation 5 df MS 
: т SSR 
Regression SSR = 57» (Ё—)* 1 MSR = 7 
a SSE 
Error SSE = УУ XY — Р.) n-2 MSE = 5 
" ; SSLF 
Lack of fit SSLF = 5 SY; – Yi с—2 MSLF = 5 
: _ SSPE 
Pure error — SSPE-Y]YXM;-Yi* п-с MSPE= —— -— 
Total SSÓ-Y:YyqYy-Y?y | n-1 
(b) Bank Example " 
Source of | 
Variation SS df MS 
Regression 5,141.3 1 5,141.3 
Error 14,741.6 9 / 1,638.0 
Lack of fit 13,593.6 4 3,398.4 
Püre-erfor 1,148.0 5 229.6 
Total 19,882.9 10 
Comments 


1. As shown by the bank example, not all levels of X need have repeat observations for the F test 
for lack of fit to be applicable. Repeat observations at only one or some levels of X are sufficient. 

2. сап be shown that the mean squares MSPE and MSLF have the following expectations when 
testing whether the regression function is linear: 


E{MSPE) = o? (3.31) 


duse Re 
E{MSLF) = о? + лш, с x 5X) 


(3.32) 


The reason for the term "pure error" is that MSPE is always an unbiased estimator of the error term 
variance o?, no matter what is the true regression function. The expected value of MSLF also is o? if 
the regression function is linear, because ш; = Bo + Bj X ; then and the second term in (3.32) becomes 
zero. On the other hand, if the regression function is not linear, и; A Во + В.Х; and E{MSLF} will 
be greater than c?. Hence, a value of F* near 1 accords with a linear regression function; large values 
of F* indicate that the regression function is not linear. 


3. The terminology "error sum of squares" and "error mean square" is not precise when the 
regression function under test in Ho is not the true function since the error sum of squares and error 
mean square then reflect the effects of both the lack of fit and the variability of the error terms. We 
continue to use the terminology for consistency and now use the term “риге error" to identify the 
variability associated with the error term only. 


Chapter 3 Diagnostics and Remedial Measures 127 


4. Suppose that prior to any analysis of the appropriateness of the model, we had fitted a linear 
regression model and wished to test whether or not £j = 0 for the bank example (Table 3.40). Test 
statistic (2.60) would be: 

„_ MSR 5,141.3 — 
F = МЕ = 1638.0 514 
For о = .10, F(.90; 1, 9) = 3.36, and we would conclude Ho, that В =0 or that there is no linear 
association between minimum deposit size (and value of gift) and number of new accounts. A conclu- 
sion that there is no relation between these variables would be improper, however. Such an inference 
requires that regression model (2.1) be appropriate. Here, there is a definite relationship, but the re- 
gression function is not linear. This illustrates the importance of always examining the appropriateness 
of a model before any inferences are drawn. 

5. The general linear test approach just explained can be used to test the appropriateness of other 
regression functions. Only the degrees of freedom for SSLF will need be modified. In general, c — p 
degrees of freedom are associated with SSLF, where p is the number of parameters in the regression 
function. For the test of a simple linear regression function, p — 2 because there are two parameters, 
Во and fi, in the regression function. 

6. The alternative H, in (3.19) includes all regression functions other than a linear one. For 
instance, it includes a quadratic regression function or a logarithmic one. If H, is concluded, a study 
of residuals can be helpful in identifying an appropriate function. 

7. When we conclude that the employed model in Ho is appropriate, the usual practice is to use 
the error mean square MSE as an estimator of o? in preference to the pure error mean square MSPE, 
since the former contains more degrees of freedom. 

8. Observations at the same level of X are genuine repeats only if they involve independent trials 
with respect to the error term. Suppose that in a regression analysis of the relation between hardness 
(Y) and amount of carbon (X) in specimens of an alloy, the error term in the model covers, among 
other things, random errors in the measurement of hardness by the analyst and effects of uncontrolled 
production factors, which vary at random from specimen to specimen and affect hardness. If the 
analyst takes two readings on the hardness of a specimen, this will not provide a genuine replication 
because the effects of random variation in the production factors are fixed in any given specimen. 
For genuine replications, different specimens with the same carbon content (X) would have to be 
measured by the analyst so that all the effects covered in the error term could vary at random from 
one repeated observation to the next. 

9. When no replications are present in a data set, an approximate test for lack of fit can be 
conducted if there are some cases at adjacent X levels for which the mean responses are quite close to 
each other. Such adjacent cases are grouped together and treated as pseudoreplicates, and the test for 
lack of fit is then carried out using these groupings of adjacent cases. A useful summary of this and 
related procedures for conducting a test for lack of fit when no replicates are present may be found in 
Reference 3.8. a 


3.8 Overview of Remedial Measures 


If the simple linear regression model (2.1) is not appropriate for a data set, there are two 
basic choices: 


+ 
2 


1. Abandon regression model (2.1) and develop and use a more appropriate model. 
2. Employ some transformation on the data so that regression model (2.1) is appropriate 
for the transformed data. 


128 PartOne Simple Linear Regression 


Each approach has advantages and disadvantages. The first approach may entail a more 
complex model that could yield better insights, but may also lead to more complex proce- 
dures for estimating the parameters. Successful use of transformations, on the other hand, 
leads to relatively simple methods of estimation and may involve fewer parameters than 
a complex model, an advantage when the sample size is small. Yet transformations may 
Obscure the fundamental interconnections between the variables, though at other times they 
may illuminate them. 

We consider the use of transformations in this chapter and the use of more complex 
models in later chapters. First, we provide a brief overview of remedial measures. 


Nonlinearity of Regression Function 
When the regression function is not linear, a direct approach is to Ошу regression 
model (2.1) by altering the nature of the regression function. For instance, a quadratic 
regression function might be used: 


E(Y) = fo + В.Х + ВХ? 
or an exponential regression function: 
Е{Ү} = Bob; 


In Chapter 7, we discuss polynomial regression functions, and in Part III we take up nonlinear 
regression functions, such as an exponential regression function. 

The transformation approach employs a transformation to linearize, at least approxi- 
mately, a nonlinear regression function. We discuss the use of transformations to linearize 
regression functions in Section 3.9. 

When the nature of the regression function is not known, exploratory analysis that does 
not require specifying a particular type of function is often useful. We discuss exploratory 
regression analysis in Section 3.10. 


Nonconstancy of Error Variance 
When the error variance is not constant but varies in a systematic fashion, a direct approach 
is to modify the model to allow for this and use the method of weighted least squares to 
obtain the estimators of the parameters. We discuss the use of weighted least squares for 
this purpose in Chapter 11. 
Transformations can also be effective in stabilizing the variance. Some of these are 
discussed in Section 3.9. 


Nonindependence of Error Terms 
When the error terms are correlated, a direct remedial measure is to work with a model that 
calls for correlated error terms. We discuss such a model in Chapter 12. A simple remedial 
transformation that is often helpful is to work with first differences, a topic also discussed 
in Chapter 12. 


Nonnormality of Error Terms 
Lack of normality and nonconstant error variances frequently go hand in hand. Fortunately, 
itis oftenthe case that the same transformation that helps stabilize the variance is also helpful 
in approximately normalizing the error terms. It is therefore desirable that the transformation 


Chapter 3 Diagnostics and Remedial Measures 129 


for stabilizing the error variance be utilized first, and then the residuals studied to see if 
serious departures from normality are still present. We discuss transformations to achieve 
approximate normality in Section 3.9. 


Omission of Important Predictor Variables 


When residual analysis indicates that an important predictor variable has been omitted from 
the model, the solution is to modify the model. In Chapter 6 and later chapters, we discuss 
multiple regression analysis in which two or more predictor variables are utilized. 


Outlying Observations 


When outlying observations are present, as in Figure 3.7a, use of the least squares and 
maximum likelihood estimators (1.10) for regression model (2.1) may lead to serious dis- 
tortions in the estimated regression function. When the outlying observations do not repre- 
sent recording errors and should not be discarded, it may be desirable to use antestimation 
procedure that places less emphasis on such outlying observations. We discuss one such 
robust estimation procedure in Chapter 11. 


3.9 'fransformations 


We now consider in more detail the use of transformations of one or both of the original 
variables before carrying out the regression analysis. Simple transformations of either the 
response variable Y or the predictor variable X, or of both, are often sufficient to make the 
simple linear regression model appropriate for the transformed data. 


Transformations for Nonlinear Relation Only 


Example 


We first consider transformations for linearizing a nonlinear regression relation when the 
distribution of the error terms is reasonably close to a normal distribution and the error 
terms have approximately constant variance. In this situation, transformations on X should 
be attempted. The reason why transformations on Y may not be desirable here is that a 
transformation on Y, such as Y' = A/Y, may materially change the shape of the distribution 
of the error terms from the normal distribution and may also lead to substantially differing 
error term variances. 

Figure 3.13 contains some prototype nonlinear regression relations with constant error 
variance and also presents some simple transformations on X that may be helpful to lin- 
earize the regression relationship without affecting the distributions of Y . Several alternative 
transformations may be tried. Scatter plots and residual plots based on each transformation 
should then be prepared and analyzed, to decide which transformation is most effective. 


Data from an experiment on the effect of number of days of training received (X) on 
performance (Y) in a battery of simulated sales situations are presented in Table 3.7, 
columns | and 2, for the 10 participants in the study. A scatter plot of these data is shown in 
Figure 3.14a. Clearly the regression relation appears to be'curvilinear, so the simple linear 
regression model (2.1) does not seenrto be appropriate. Since the variability at the different 
X levels appears to be fairly constant, we shall consider a transformation on X. Based on 
the prototype plot in Figure 3.13a, we shall consider initially the-square root transformation 
X' = JX. The transformed values are shown in column 3 of Table 3.7. 


130 PartOne Simple Linear Regression 


FIGURE 3.13 
Prototype 
Nonlinear 
Regression 
Patterns with 


Constant Error 


Variance and 
Simple Trans- 
formations 

of X. 


TABLE 3.7 
Use of Square 
Root Transfor- 
mation of X to 
Linearize 
Regression 
Relation— 
Sales Training 
Exaniple. 


Prototype Regression Pattern Transformations of X 


(a) X'=logigX Х= X 
(b) Х'= 2  X'-exp(X) 
de 
(c) Х'=1/Х X exp(-X) 
(1) (2) (3) 
Sales Days of Performance 
Trainee Training Score 
i Xi Y; X; =4/ Xi 
1 42.5 70711 
2 5 50.6 ‚70711 
3 1.0 68:5 1.00000 
4 1.0 80.7 1.00000 
5 1:5 89.0 1.22474 
6. 1.5 99.6 1.22474 
7 2.0 105.3 1.41421 
8 2.0 111:8 1.41421 
9 2.5 112.3 1.58114 
10 2.5 125.7 1.58114 


In Figure 3.14b, the same data are plotted with the predictor variable transformed to 
X' = /Х. Note that the scatter plot now shows a reasonably linear relation. The variability 
of the scatter at the different X levels is the same as before, since we did not make a 
transformation on Y. 

To examine further whether the simple linear regression model (2.1) is appropriate now, 
we fit it to the transformed X data. The regression calculations with the transformed X data 
are carried out in the usual fashion, except that the predictor variable now is X'. We obtain 
the following fitted regression function: 


Ў = —10.33 + 83.45X' 


Figure 3.14c contains a plot of the residuals against X'. There is no evidence of lack of 
fitor of strongly unequal error variances. Figure 3.14d contains a normal probability plot of 


Chapter 3 Diagnostics and Remedial Measures 131 


FIGURE 3.14 Scatter Plots and Residual Plots—Sales Training Example. 


Residual 


Performance 


140 


118 


96 


74 


52 


(a) Scatter Plot (b) Scatter Plot against VX 
140 


118 
96 


74 


Performance 


Residual 


the residuals. No strong indications of substantial departures from normality are indicated 
by this plot. This conclusion is supported by the high coefficient of correlation between the 
ordered residuals and their expected values under normality, 979. For о = .01, Table В.б 
shows that the critical value is .879, so the observed coefficient is substantially larger 
and supports the reasonableness of normal error terms. Thus, the simple linear regression 
model (2.1) appears to be appropriate here for the transformed data. 

The fitted regression function in the original units of X can easily be obtained, if desired: 


f = —1033 + 83.45V X 


132 PartOne Simple Linear Regression 


FIGURE 3.15 
Prototype 
Regression 
Patterns with 
Unequal Error 
Variances and 
Simple Trans- 
formations 

of Y. 


Prototype Regression Pattern 


(a) (b) 


9, 


Transformations on Y 


ү'= Үү 
Y = 1093 Y 
Y = 1/Y 


Note: A simultaneous transformation on X may also be helpful or necessary. 


Comment 

At times, it may be helpful to introduce a constant into the transformation. For example, if some of 
the X data are near zero and the reciprocal transformation is desired, we can shift the origin by using 
the transformation X' = 1/(X + К), where К is an appropriately chosen constant. п 


Transformations for Nonnormality and Unequal Error Variances 


Example 


Unequal error variances and nonnormality of the error terms frequently appear together. 
To remedy these departures from the simple linear regression model (2.1), we need a 
transformation on Y, since the shapes and spreads of the distributions of Y need to be 
changed. Such a transformation on Y may also at the same time help to linearize a curvilinear 
regression relation. At other times, a simultaneous transformation on X may be needed to 
obtain or maintain a linear regression relation. 

Frequently the nonnormality and unequal variances departures from regression 
model (2.1) take the form of increasing skewness and increasing variability of the distribu- 
tions of the error terms as the mean response E(Y] increases. For example, in a regression 
of yearly household expenditures for vacations (Y) on household income (X), there will 
tend to be more variation and greater positive skewness (1.е., some very high yearly vacation 
expenditures) for high-income households than for low-income households, who tend to 
consistently spend much less for vacations. Figure 3.15 contains some prototype regression 
relations where the skewness and the error variance increase with the mean response E {У}. 
This figure also presents some simple transformations on Y that may be helpful for these 
cases. Several alternative transformations on Y may be tried, as well as some simultaneous 
transformations on X. Scatter plots and residual plots should be prepared to determine the 
most effective transformation(s). 


Data on age (X) and plasma level of a polyamine (Y) for a portion of the 25 healthy 
children in a study are presented in columns 1 and 2 of Table 3.8. These data are plotted in 
Figure 3.16a as a scatter plot. Note the distinct curvilinear regression relationship, as well 
as the greater variability for younger children than for older ones. 


ТАВГЕ 3.8 
Use of 
Logarithmic 
Transforma- 
tion of Y to 
Linearize 
Regression 
Relation and 
Stabilize Error 
Variance— 
Plasma Levels 
Example. 


Chapter 3 Diagnostics and Remedial Measures 133 


(1) (2) * (3) 
Child Age Plasma Level "m 
i Xi Yi Y; = log, Y; 
O0 (newborn) 13.44 1.1284 
2 0 (newborn) 12.84 1.1086 
3 0 (newborn) 11.91 1.0759 
4 0 (newborn) 20.09 1.3030 
5 0 (newborn) 15.60 1.1931 
6 1.0 10.11 1.0048 
7 1.0 11.38 1.0561 
19 3.0 6.90 8388. 
20 3.0 6.77 .8306 " 
21 4.0 4.86 .6866 
22 4.0 5.10 .7076 
23 4.0 5.67 .7536 
24 4.0 5.75 .7597 
25 . 4.0 6.23 2945 


On the basis of the prototype regression pattern in Figure 3.15b, we shall first try the 
logarithmic transformation Y’ = log, Y. The transformed Y values are shown in column 3 
of Table 3.8. Figure 3.16b contains the scatter plot with this transformation. Note that the 
transformation not only has led to a reasonably linear regression relation, but the variability 
at the different levels of X also has become reasonably constant. 

To further examine the reasonableness of the transformation Y’ = Іор, Y, we fitted the 
simple linear regression model (2.1) to the transformed Y data and obtained: 


Y’ = 1.135 — .1023X 


A plot of the residuals against X is shown in Figure 3.16c, and a normal probability plot of 
the residuals is shown in Figure 3.16d. The coefficient of correlation between the ordered 
residuals and their expected values under normality is .981. Fora = .05, Table B.6 indicates 
that the critical value is .959 so that the observed coefficient supports the assumption of 
normality of the error terms. All of this evidence supports the appropriateness of regression 
model (2.1) for the transformed Y data. 


t 


~ 


Comments 


1. At times it may be desirable to introduce a constant into a transformation of Y, such as when 
Y may be negative. For instance, the logarithmic transformation to shift the origin in Y and make all 
Y observations positive would be Y' = log; (У + k), where k is an appropriately chosen constant. 

2. When unequal error variances are present but the regression relation is linear, a transformation 
on Y may not be sufficient. While such a transformation may stabilize the error variance, it will also 
change the linear relationship to a curvilinear one. A transformation on X may therefore also be 
required. This case can also be handled by using weighted least squares, a procedure explained in 
Chapter 11. : u 


124 PartOne Simple Linear Regression 


The difference between the two error sums of squares is called the lack of fit sum of squares 


here and is denoted by SSLF: 
SSLF — SSE — SSPE (3.24) 
We can then express the test statistic as follows: 
.  SSLF | SSPE 
^ €—2  n—-c 
MSLF 
= —_ 3.25 
MSPE ( ) 
where MSLF denotes the lack of fit mean square and MSPE denotes the purg error mean 
p 


square. 
We know that large values of F* lead to conclusion H, in the general linear test. Decision 


rule (2.71) here becomes: 
If F* < F(1 —– о; с —– 2, п — с), conclude H, 
), GO 0 (3.26) 
If F* > F(1 —– о;с —– 2, п — c), conclude H, 


For the bank example, the test statistic can be constructed easily from our earlier results: 
SSPE — 1,148.0 п—с=11—6=5 
SSE = 14,741.6 
SSLF — 14,741.6 — 1,148.0 — 13,593.6 c—2=6-2=4 
_ 13,593.6 1,1480 


F* 
4 5 
3,398.4 
= 329.6 — 14.80 


If the level of significance is to be a = .01, we require F(.99;4, 5) = 11.4. Since 
F* = 14.80 > 11.4, we conclude H,, that the regression function is not linear. This, of 
course, accords with our visual impression from Figure 3.11. The P-value for the test is 
.006. 


ANOVA Table 


The definition of the lack of fit sum of squares SSLF in (3.24) indicates that we have, in 
fact, decomposed the error sum of squares SSE into two components: 


SSE = SSPE + SSLF (3.27) 
This decomposition follows from the identity: 
Y; -1; = Y4 - Y; + Y; - 15 (3.28) 
Error Pure error Lack of fit 
deviation deviation deviation 


This identity shows that the error deviations in SSE are made up of a pure error component 
and a lack of fit component. Figure 3.12 illustrates this partitioning for the case Уз = 160, 
X3 = 125 in the bank example. 


FIGURE 3.12 
Illustration of 
Decomposition 
of Error 
Deviation 

Yi; —- $;— 
Bank 
Example. 


Chapter 3 Diagnostics and Remedial Measures 125 


Y 
Үз = 160 
160 m 7 
error deviati 5 = ү. — Y. = 
А (pure error deviation) 43 — ¥3¢ Y 155 
t 
з 
о 
9 
< НЕ P EV УА zv 
z (lack of fit deviation) 43 = Y, — $4; Үлз — Үз = 48 (error deviation) 
Š 130 
5 
Б 
Q 
Е 
23 
2 
100 
ў = 50.72251 + .48670X 
+f EN bee —— І ll Lp 
0 75 100 125 150 X k 


Size of Minimum Deposit (dollars) 


When (3.28) is squared and summed over all observations, we obtain (3.27) since the 
cross-product sum equals zero: 


Diddy = 57) ty BF + DE Ba? uu) 


SSE = SSPE + SSLF 


Note from (3.29) that we can define the lack of fit sum of squares directly as follows: 
SSLF = УУ; Р) (3.30) 


Since all Y;; observations at the level X; have the same fitted value, which we can denote 
by Y;, we can express (3.30) equivalently as: 


SSLF = V 5 nj(Y; — #0)? (3.302) 
1 


Formula (3.30а) indicates clearly why SSLF measures lack of fit. If the linear regression 
function is appropriate, then the means Y; will be near ће fitted values Ў; calculated from 
the estimated linear regression function and SSLF will be small. On the other hand, if the 
linear regression function is not appropriate, the means Ў; will not be near the fitted values 
calculated from the estimated linear regression function, as in Figure 3.11 for the bank 
example, and SSLF will be large. 

Formula (3.302) also indicates why c — 2 degrees of freedom are associated with SSLF. 
There are c means Y; inthe sum of squares, and two degrees of freedom are lost in estimating 
the parameters Во and f, of the linear regression function to obtain the fitted values f;. 

An ANOVA table can be constructed for the decomposition of SSE. Table 3.6a contains 
the general ANOVA table, including the decomposition of SSE just explained and the 
mean squares of interest, and Table 3.6b contains the ANOVA decomposition for the bank 
example. 


126 PartOne Simple Linear Regression 


TABLE 3.6 
General 
ANOVA Table 
for Testing 
Lack of Fit of 
Simple Linear 
Regression 
Function and 
ANOVA 
Table—Bank 
Example. 


2: | (а) Сепега! 
Source of 
Variation SS df MS 
. a in rera SSR 
Regression 558 = 3: Y (fij — Y) 1 MSR = ~= 
3 SSE 
Error. SSE = УУ; cz Vi) n—-2 MSE x noo 
: " SSLF 
Lack of fit SSLF = УУУ; = Ӯ: с 2 MSLF = Рау) 
— NDS SSPE 
Pure error SSPE = УУУУ — Y) пс MSPE = ——— wa 
Total SSTO= 3: Y XY - YY n-1 
(b).Bank Example "m 
Source of Nes 
Variation .SS df MS 
Regression 5141.3: 1 ‚5,141.3 
Error 14;741.6 9 ^ 1,638.0 
Lack of fit. 13,593.6 4 3,398.4 
Pure-error 1,148.0 5 229.6 
Total 19,882.9 10 
Comments 


1. As shown by the bank example, not all levels of X need have repeat observations for the F test 
for lack of fit to be applicable. Repeat observations at only one or some levels of X are sufficient. 


2. Itcan be shown that the mean squares MSPE and MSLF have the following expectations when 
testing whether the regression function is linear: 


E{MSPE} = о? 


lus — sp 
E{MSLF) = о? + ЭШ ЕЕ AXyl 


(3.31) 


(3.32) 


The reason for the term “риге error" is that MSPE is always an unbiased estimator of the error term 
variance o?, no matter what is the true regression function. The expected value of MSLF also is с2 if 
the regression function is linear, because p; = Bo+ В: Х ; then and the second term in (3.32) becomes 
zero. On the other hand, if the regression function is not linear, p; 5% Bo + B; X; and E(MSLF) will 
be greater than o?. Hence, a value of F* near 1 accords with a linear regression function; large values 
of F* indicate that the regression function is not linear. 

3. The terminology "error sum of squares" and "error mean square" is not precise when the 
regression function under test in Ho is not the true function since the error sum of squares and error 
mean square then reflect the effects of both the lack of fit and the variability of the error terms. We 
continue to use the terminology for consistency and now use the term "pure error" to identify the 
variability associated with the error term only. 


Chapter 3 Diagnostics and Remedial Measures 127 


4. Suppose that prior to any analysis of the appropriateness of the model, we had fitted a linear 
regression model and wished to test whether or not 6; = 0 for the bank example (Table 3.4b). Test 
statistic (2.60) would be: 


Ps MSR _ 5,1413 
^ MSE 1,638.0 


For о = .10, F(.90; 1, 9) —3.36, and we would conclude Ho, that В = 0 or that there is no linear 
association between minimum deposit size (and value of gift) and number of new accounts. A conclu- 
sion that there is no relation between these variables would be improper, however. Such an inference 
requires that regression model (2.1) be appropriate. Here, there is a definite relationship, but the re- 
gression function is not linear. This illustrates the importance of always examining the appropriateness 
of a model before any inferences are drawn. 


= 3.14 


5. The general linear test approach just explained can be used to test the appropriateness of other 
regression functions. Only the degrees of freedom for SSLF will need be modified. In general, c — p 
degrees of freedom are associated with SSLF, where p is the number of parameters in theyregression 
function. For the test of a simple linear regression function, p = 2 because there are two parameters, 
Во and fi, in the regression function. 

6. The alternative H, in (3.19) includes all regression functions other than a linear one. For 
instance, it includes a quadratic regression function or a logarithmic one. If H, is concluded, a study 
of residuals can be helpful in identifying an appropriate function. 

7. When we conclude that the employed model in Ho is appropriate, the usual practice is to use 
the error mean square MSE as an estimator of с? in preference to the pure error mean square MSPE, 
since the former contains more degrees of freedom. 

8. Observations at the same level of X are genuine repeats only if they involve independent trials 
with respect to the error term. Suppose that in a regression analysis of the relation between hardness 
(Y) and amount of carbon (X) in specimens of an alloy, the error term in the model covers, among 
other things, random errors in the measurement of hardness by the analyst and effects of uncontrolled 
production factors, which vary at random from specimen to specimen and affect hardness. If the 
analyst takes two readings on the hardness of a specimen, this will not provide a genuine replication 
because the effects of random variation in the production factors are fixed in any given specimen. 
For genuine replications, different specimens with the same carbon content (X) would have to be 
measured by the analyst so that all the effects covered in the error term could vary at random from 
one repeated observation to the next 

9. When no replications are present in a data set, an approximate test for lack of fit can be 
conducted if there are some cases at adjacent X levels for which the mean responses are quite close to 
each other. Such adjacent cases are grouped together and treated as pseudoreplicates, and the'test for 
lack of fit is then carried out using these groupings of adjacent cases. A useful summary of this and 
related procedures for conducting a test for lack of fit when no replicates are present may be found in 
Reference 3.8. ` 


+ 


3.8 Overview of Remedial Measures 


If the simple linear regression model (2.1) is not appropriate for a data set, there are two 
basic choices: Е ? 


1. Abandon regression model (2.1) and develop and use a more appropriate model. 
2. Employ some transformation on the data so that regression model (2.1) is appropriate 
for the transformed data. 


128 PartOne Simple Linear Regression 


Each approach has advantages and disadvantages. The first approach may entail a more 
complex model that could yield better insights, but may also lead to more complex proce- 
dures for estimating the parameters. Successful use of transformations, on the other hand, 
leads to relatively simple methods of estimation and may involve fewer parameters than 
a complex model, an advantage when the sample size is small. Yet transformations may 
obscure the fundamental interconnections between the variables, though at other times they 
may illuminate them. 

We consider the use of transformations in this chapter and the use of more complex 
models in later chapters. First, we provide a brief overview of remedial measures. 


Nonlinearity of Regression Function | 
When the regression function is not linear, a direct approach is to тойу regression 
model (2.1) by altering the nature of the regression function. For instance, a quadratic 
regression function might be used: 


E(Y) = Bo + AX + X? 


or an exponential regression function: 
E(Y} = Bobi’ 


In Chapter 7, we discuss polynomial regression functions, and in Part III we take up nonlinear 
regression functions, such as an exponential regression function. 

The transformation approach employs a transformation to linearize, at least approxi- 
mately, a nonlinear regression function. We discuss the use of transformations to linearize 
regression functions in Section 3.9. 

When the nature of the regression function is not known, exploratory analysis that does 
not require specifying a particular type of function is often useful. We discuss exploratory 
regression analysis in Section 3.10. 


Nonconstancy of Error Variance 
When the error variance is not constant but varies in a systematic fashion, a direct approach 
is to modify the model to allow for this and use the method of weighted least squares to 
obtain the estimators of the parameters. We discuss the use of weighted least squares for 
this purpose in Chapter 11. 
Transformations can also be effective in stabilizing the variance. Some of these are 
discussed in Section 3.9. 


Nonindependence of Error Terms 
When the error terms are correlated, a direct remedial measure is to work with a model that 
calls for correlated error terms. We discuss such a model in Chapter 12. A simple remedial 
transformation that is often helpful is to work with first differences, a topic also discussed 
in Chapter 12. 


Nonnormality of Error Terms 
Lack of normality and nonconstant error variances frequently go hand in hand. Fortunately, 
itis often the case that the same transformation that helps stabilize the variance is also helpful 
in approximately normalizing the errorterms. Itis therefore desirable that the transformation 


Chapter 3 Diagnostics and Remedial Measures 129 


for stabilizing the error variance be utilized first, and then the residuals studied to see if 
serious departures from normality are still present. We discuss transformations to achieve 
approximate normality in Section 3.9. 


Omission of Important Predictor Variables 


When residual analysis indicates that an important predictor variable has been omitted from 
the model, the solution is to modify the model. In Chapter 6 and later chapters, we discuss 
multiple regression analysis in which two or more predictor variables are utilized. 


Outlying Observations 


When outlying observations are present, as in Figure 3.7a, use of the least squares and 
maximum likelihood estimators (1.10) for regression model (2.1) may lead to serious dis- 
tortions in the estimated regression function. When the outlying observations do not repre- 
sent recording errors and should not be discarded, it may be desirable to use an,estimation 
procedure that places less emphasis on such outlying observations. We discuss one such 
robust estimation procedure in Chapter 11. 


3.0 ‘Transformations 


We now consider in more detail the use of transformations of one or both of the original 
variables before carrying out the regression analysis. Simple transformations of either the 
response variable Y or Ше predictor variable X, or of both, are often sufficient to make the 
simple linear regression model appropriate for the transformed data. 


Transformations for Nonlinear Relation Only 


Example 


We first consider transformations for linearizing a nonlinear regression relation when the 
distribution of the error terms is reasonably close to a normal distribution and the error 
terms have approximately constant variance. In this situation, transformations on X should 
be attempted. The reason why transformations on Y may not be desirable here is that a 
transformation on Y, such as У’ = /Y, may materially change the shape of the distribution 
of the.error terms from the normal distribution and may also lead to substantially differing 
error term variances. 

Figure 3.13 contains some prototype nonlinear regression relations with constant error 
variance and also presents some simple transformations on X that may be helpful to lin- 
earize the regression relationship without affecting the distributions of Y . Several alternative 
transformations may be tried. Scatter plots and residual plots based on each transformation 
should then be prepared and analyzed to decide which transformation is most effective. 

P 


Data from an experiment on the effect of number of days of training received (X) on 
performance (Y) in a battery of simulated sales situations are presented in Table 3.7, 
columns 1 and 2, for the 10 participants in the study. A scatter plot of these data is shown in 
Figure 3.14a. Clearly the regression relation appears to be-curvilinear, so the simple linear 
regression model (2.1) does not seem to be appropriate. Since the variability at the different 
X levels appears to be fairly constant, we shall consider a transformation on X. Based on 
the prototype plot in Figure 3.13a, we shall consider initially the-square root transformation 
X' = VX. The transformed values are shown in column 3 of Table 3.7. 


130 PartOne Simple Linear Regression 


FIGURE 3.13 
Prototype 
Nonlinear 
Regression 
Patterns with 


Constant Error 


Variance and 
Simple Trans- 
formations 

of X. 


TABLE 3.7 
Use of Square 
Root Transfor- 
mation of X to 
Linearize 
Regression 
Relation— 
Sales Training 
Example. 


Prototype Regression Pattern Transformations of X 


(a) X-2logigX X= 
(b) X'= 2 Х'=ехр(Х) 
we 
(с) Х'=1/Х _ Х'=ехр(—Х) 
(1) (2) (3) 
Sales: Days of Performance 
Trainee Training Score | 
i Xi Y Xi = VX; 
1 5 42.5. 70711 
2 5 50.6 70711 
3 1.0 68.5 1.00000 
4 1.0 80.7 1.00000 
E 1.5 89.0 1.22474 
6 1.5. 99.6 1.22474 
7 2.0 105.3 1.41421 
8 2:0 111.8 1.41421 
9 2.5 112.3 1.58114 
10 ‚ 25 125.7 1.58114 


In Figure 3.14b, the same data are plotted with the predictor variable transformed to 
X' = VX. Note that the scatter plot now shows a reasonably linear relation. The variability 
of the scatter at the different X levels is the same as before, since we did not make a 
transformation on Y. 

To examine further whether the simple linear regression model (2.1) is appropriate now, 
we fit it to the transformed X data. The regression calculations with the transformed X data 
are carried out in the usual fashion, except that the predictor variable now is X’. We obtain 
the following fitted regression function: 


Y = —10.33 + 83.45X’ 


Figure 3.14c contains a plot of the residuals against X’. There is no evidence of lack of 
fit or of strongly unequal error variances. Figure 3.14d contains a normal probability plot of 


Chapter3 Diagnostics and Remedial Measures 131 


FIGURE 3.14 Scatter Plots and Residual Plots—Sales Training Example. 


Residual 


Performance 


140 


118 


96 


74 


52 


0.6 


0.8 


(a) Scatter Plot (b) Scatter Plot against VX 
140 


118 
96 


74 


Performance 


Residual 


the residuals. No strong indications of substantial departures from normality are indicated 
by this plot. This conclusion is supported by the high coefficient of correlation between the 
ordered residuals and their expected values under normality, .979. For о  .01, Table B.6 
shows that the critical value is .879, so the observed coefficient is substantially larger 
and supports the reasonableness of normal error terms. Thus, the simple linear regression 
model (2.1) appears to be appropriate hére for the transformed data. 

The fitted regression function in the original units of X can easily be obtained, if desired: 


f == —10.33 + 83.45 X 


132 PartOne Simple Linear Regression 


FIGURE 3.15 
Prototype 
Regression 
Patterns with 
Unequal Error 
Variances and 
Simple Trans- 
formations 

of Y. 


Prototype Regression Pattern 


(a) (b) (с) 
Transformations on Y E 
Ү'=ү 
Y = 10910 Y 
Y= 1/Y 


Note: A simultaneous transformation on X may also be helpful or necessary. 


Comment 

At times, it may be helpful to introduce a constant into the transformation. For example, if some of 
the X data are near zero and the reciprocal transformation is desired, we can shift the origin by using 
the transformation X’ = 1/(X + k), where К is an appropriately chosen constant. a 


Transformations for Nonnormality and Unequal Error Variances 


Example 


Unequal error variances and nonnormality of the error terms frequently appear together. 
To remedy these departures from the simple linear regression model (2.1), we need a 
transformation on Y, since the shapes and spreads of the distributions of Y need to be 
changed. Such a transformation on Y may also at the same time help to linearize acurvilinear 
regression relation. At other times, a simultaneous transformation on X may be needed to 
obtain or maintain a linear regression relation. 

Frequently, the nonnormality and unequal variances departures from regression 
model (2.1) take the form of increasing skewness and increasing variability of the distribu- 
tions of the error terms as the mean response E{Y} increases, For example, in a regression 
of yearly household expenditures for vacations (Y) on household income (X), there will 
tend to be more variation and greater positive skewness (i.e., some very high yearly vacation 
expenditures) for high-income households than for low-income households, who tend to 
consistently spend much less for vacations. Figure 3.15 contains some prototype regression 
relations where the skewness and the error variance increase with the mean response E{Y}. 
This figure also presents some simple transformations on Y that may be helpful for these 
cases, Several alternative transformations on Y may be tried, as well as some simultaneous 
transformations on X. Scatter plots and residual plots should be prepared to determine the 
most effective transformation(s). 


Data on age (X) and plasma level of a polyamine (Y) for a portion of the 25 healthy 
children in a study are presented in columns | and 2 of Table 3.8. These data are plotted in 
Figure 3.16a as a scatter plot. Note the distinct curvilinear regression relationship, as well 
as the greater variability for younger children than for older ones. 


TABLE 3.8 
Use of А 
Logarithmic 
Transforma- 
tion of Y to 
Linearize 
Regression 
Relation and 
Stabilize Error 
Variance— 
Plasma Levels 
Example. 


Chapter 3 Diagnostics and Remedial Measures 133 


(1) (2) (3) 
Child Age Plasma Level 

i Xi Y; Y; = 1091 Y; 

1 0 (newborn) 13.44 1.1284 

2 О (newborn) 12.84 1.1086 

3 0 (newborn) 11.91. 1.0759 

4 О (newborn) 20.09 1.3030 

5 0 (newborn) 15.60 1.1931 

6 1.0 10.11 1.0048 

7 1.0 11.38 1.0561 
19 3.0 6.90 8388 
20 3.0 6.77 8306 
21 4.0 4.86 .6866 
22 4.0 5.10 7076 b 
23 4.0 5.67 .7536 
-24 4.0 5.75 .7597 
25 4.0 6.23 .7945 


On the basis of the prototype regression pattern in Figure 3.15b, we shall first try the 
logarithmic transformation Y' = log), Y. The transformed Y values are shown in column 3 
of Table 3.8. Figure 3.16b contains the scatter plot with this transformation. Note that the 
transformation not only has led to a reasonably linear regression relation, but the variability 
at the different levels of X also has become reasonably constant. 

To further examine the reasonableness of the transformation Y’ = log,, Y, we fitted the 
simple linear regression model (2.1) to the transformed Y data and obtained: 


Y’ = 1.135 — .1023X 


A plot of the residuals against X is shown in Figure 3.16c, and a normal probability plot of 
the residuals is shown in Figure 3.16d. The coefficient of correlation between the ordered 
residuals and their expected values under normality is .981. Fora = .05, Table B.6 indicates 
that the critical value is .959 so that Ше observed coefficient supports the assumption of 
normality of the error terms. АП of this evidence supports the appropriateness of regression 
model (2.1) for the transformed Y data. 


Comments Ы 


1. At times it may be desirable to introduce a constant into a transformation of Y, such as when 
Y may be negative. For instance, the logarithmic transformation to shift the origin in Y and make all 
Y observations positive would be Y' = log;g(Y + k), where k is an appropriately dhosen constant. 

2. When unequal error variances are present but the regression relation is linear, a transformation 
on Y may not be sufficient. While such a transformation may stabilize the error variance, it will also 
change the linear relationship to a curvilinear one. A transformation on X may therefore also be 
required. This case can also be handled by using weighted least squares, a procedure explained in 
Chapter 11. г п 


134 PartOne Simple Linear Regression 


FIGURE 3.16  Scatter Plots and Residual Plots—Plasma Levels Example. 
(b) Scatter Plot with Y' = log;) Y 


(a) Scatter Plot 


Plasma Level 


Age 


(c) Residual Plot against X 


Residual x 102 
Residual x 102 


—10 -5 0 5 10 X 
Age $ Expected Value x 102 


Box-Cox Transformations 


It is often difficult to determine from diagnostic plots, such as the one in Figure 3.16a for 
the plasma levels example, which transformation of Y is most appropriate for correcting 
skewness of the distributions of error terms, unequal error variances, and nonlinearity of the 
regression function. The Box-Cox procedure (Ref. 3.9) automatically identifies a transfor- 
mation from the family of power transformations on Y . The family of power transformations 


Chapter 3 Diagnostics and Remedial Measures 135 


is of the form: 
Y'= Y> (3.33) 


where À is a parameter to be determined from the data. Note that this family encompasses 
the following simple transformations: 


A=2 ү = ү? 
А = .5 ү = Y 
А= 0 Y’ = log, Y (by definition) (3.34) 
А= —.5 yis. 
VY 
A=-10 Y'= i 
Y Ь 


The normal error regression model with the response variable a member of the family of 
power transformations in (3.33) becomes: 


Y? = fo + В.Х; + е (3.35) 


Note that regression model (3.35) includes an additional parameter, A, which needs to be 
estimated. The Box-Cox procedure uses the method of maximum likelihood to estimate A, 
as well as the other parameters Во, f, and a”. In this way, the Box-Cox procedure identifies 
i, the maximum likelihood estimate of А. to use in the power transformation. 

Since some statistical software packages do not automatically provide the Box-Cox max- 
imum likelihood estimate À for the power transformation, a simple procedure for obtaining 
À using standard regression software can be employed instead. This procedure involves a 
numerical search in a range of potential А values; for example, А = —2, A = —1.75, ..., 
А = 1.75, А = 2. For each А value, the Y^ Observations are first standardized so that the 
magnitude of the error sum of squares does not depend on the value of А: 


Е Az0 


= .36 
Kalog, Y) 4-0 eo 


where: 


` 


п 1/п 
K, = (Ii s) (3.36a) 


1=1 
1 


К — 
1P AK}! 


(3.36b) 
Note that К» is the geometric mean of the Y; observations. 

Once the standardized observations W; have been obtained for a given А value, they are 
regressed on the predictor variable X-and the error sum of squares SSE is obtained. It can be 
shown that the maximum likelihood estimate A is that value of A for which SSE is a minimum. 

If desired, a finer search can be conducted in the neighborhood of the А value that 
minimizes SSE. However, the Box-Cox procedure ordinarily is used only to provide a guide 
for selecting a transformation, so overly precise results are not needed. In any case, scatter 


136 PartOne Simple Linear Regression 


Example 


TABLE 3.9 
Box-Cox 
Results— 
Plasma Levels 
Example. 


FIGURE 3.17 
SAS-JMP 
Box-Cox 
Results— 
Plasma Levels 
Example. 


and residual plots should be utilized to examine the appropriateness of the transformation 
identified by the Box-Cox procedure. 


Table 3.9 contains the Box-Cox results for the plasma levels example. Selected values of A, 
ranging from —1.0 to 1.0, were chosen, and for each chosen А the transformation (3.36) 
was made and the linear regression of W on X was fitted. For instance, for A = .5, the 
transformation W; = K, (VY, ;—1) was made and the linear regression of W on X was fitted. 
For this fitted linear regression, the error sum of squares is SSE = 48.4. The transformation 
that leads to the smallest value of SSE corresponds to А = —.5, for which SSE = 30.6. 

Figure 3.17 contains the SAS-JMP Box-Cox results for this example. It consists of a 
plot of SSE as a function of X. From the plot, it is clear that a power value near A = —.50 
is indicated. However, SSE as a function of A is fairly stable in the rangesffom near 0 to 
—1.0, so the earlier choice of the logarithmic transformation Y’ = log, Y for the plasma 
levels example, corresponding to А = 0, is not unreasonable according to the Box-Cox 
approach. One reason the logarithmic transformation was chosen here is because of the 
ease of interpreting it. The use of logarithms to base 10,rather than natural logarithms does 
not, of course, affect the appropriateness of the logarithmic transformation. 


Comments 


1. Attimes, theoretical or a priori considerations can be utilized to help in choosing an appropriate 
transformation. For example, when the shape of the scatter in a study of the relation between price of a 
commodity (X) and quantity demanded (У) is that in Figure 3.15b, economists may prefer logarithmic 
transformations of both Y and X because the slope of the regression line for the transformed variables 
then measures the price elasticity of demand. The slope is then commonly interpreted as showing the 
percent change in quantity demanded per 1 percent change in price, where it is understood that the 
changes are in opposite directions. 


X sÉ А SÉ 


1.0 78.0 i 33.1 
9 70.4 =.3 312 
7 57.8 =A 30:7 
5 48.4. 55. 30.6 
3 41.4 6 30:7 
Л 36:4 —2 p 
0 34.5 —9 327 
—1 0 33,9 
60 
ы 40 
^^ 
20 
0 


—1.5 -1.0 -05 0 5 1.0 
А 


15 20 


Chapter 3 Diagnostics and Remedial Measures 137 


Similarly, scientists may prefer logarithmic transformations of both Y and X when studying the 
relation between radioactive decay (Y) of a substance and time (X) for a curvilinear relation of the 
type illustrated in Figure 3.15b because the slope of the regression line for the transformed variables 
then measures the decay rate. 

2. Afteratransformation has been tentatively selected, residual plots and other analyses described 
earlier need to be employed to ascertain that the simple linear regression model (2.1) is appropriate 
for the transformed data. 

3. When transformed models are employed, the estimators Ро and b, obtained by least squares 
have the least squares properties with respect to the transformed observations, not the original ones. 

4. The maximum likelihood estimate of à with the Box-Cox procedure is subject to sampling 
variability. In addition, the error sum of squares SSE is often fairly stable in a neighborhood around the 
estimate. It is therefore often reasonable to use a nearby A value for which the power transformation 
is easy to understand. For example, use of A = О instead of the maximum likelihood estimate 
i = .13 or use of A = —.5 instead of À = —.79 may facilitate understanding without sacrificing 
much in terms of the effectiveness of the transformation. To determine the reasonablengss of using 
an easier-to-understand value of A, one should examine the flatness of the likelihood function in 
the neighborhood of i, as we did in the plasma levels example. Alternatively, one may construct an 
approximate confidence interval for А; the procedure for constructing such an interval is discussed in 
Reference 3.10. 

5. When the Box-Cox procedure leads to a A value near 1, no transformation of Y may be needed. 


3.10 Exploration of Shape of Regression Function 


Scatter plots often indicate readily the nature of the regression function. For instance, 
Figure 1.3 clearly shows the curvilinear nature of the regression relationship between steroid 
level and age. At other times, however, the scatter plot is complex and it becomes difficult to 
see the nature of the regression relationship, if any, from the plot. In these cases, it is helpful 
to explore the nature of the regression relationship by fitting a smoothed curve without any 
constraints on the regression function. These smoothed curves are also called nonparametric 
regression curves. They are useful not only for exploring regression relationships but also 
for confirming the nature of the regression function when the scatter plot visually suggests 
the nature of the regression relationship. 

Many smoothing methods have been developed for obtaining smoothed curves for time 
series data, where the X; denote time periods that are equally spaced apart. The method of 
moving averages uses the mean of the Y observations for adjacent time periods to obtain 
smoothed values. For example, the mean of the Y values for the first three time periods 
in the time series might constitute the, first smoothed value corresponding to the middle 
of the three time periods, in ather words, corresponding to time period 2. Then the mean 
of the Y values for the second, third, and fourth time periods would constitute the second 
smoothed value, corresponding to the middle оЁ these three time periods, in other words, 
corresponding to time period 3, and so on. Special procedures are required for obtaining 
smoothed values at the two ends of the time series. The larger the successive neighborhoods 
used for obtaining the smoothed values, the smoother the curve will be. 

The method of running medians is similar to the method of moving averages, except 
that the median is used as the average measure in order to reduce the influence of outlying 


138 PartOne Simple Linear Regression 


observations. With this method, as well as with the moving average method, successive 
smoothing of the smoothed values and other refinements may be undertaken to provide a 
suitable smoothed curve for the time series. Reference 3.11 provides a good introduction 
to the running median smoothing method. 

Many smoothing methods have also been developed for regression data when the X 
values are not equally spaced apart. A simple smoothing method, band regression, divides 
the data set into a number of groups or “bands” consisting of adjacent cases according to 
their X levels. For each band, the median X value and the median Y value are calculated, 
and the points defined by the pairs of these median values are then connected by straight 
lines. For example, consider the following simple data set divided into three groups: 


wat 
Median Median 
X Y X Y 
2.0 13.1 
2.7 444 
3.4 15.7 
3.7 14.9 
4.5 16.8 4.5 16.8 
5.0 17.1 
5.2 16.9 
55 17.3 
5.9 17.8 с 3 


The three pairs of medians are then plotted on the scatter plot of the data and connected by 
straight lines as a simple smoothed nonparametric regression curve. 


Lowess Method 


The lowess method, developed by Cleveland (Ref. 3.12), is a more refined nonparametric 
method than band regression. It obtains a smoothed curve by fitting successive linear re- 
gression functions in local neighborhoods. The name lowess stands for locally weighted 
regression scatter plot smoothing. The method is similar to the moving average and running 
median methods in that it uses a neighborhood around each X value to obtain a smoothed 
Y value corresponding to that X value. It obtains the smoothed Y value at a given X by 
fitting a linear regression to the data in the neighborhood of the X value and then using the 
fitted value at X as the smoothed value. To illustrate this concretely, let (X,, Y;) denote the 
sample case with the smallest X value, (X2, Y2) denote the sample case with the second 
smallest X value, and so on. If neighborhoods of three X values are used with the lowess 
method, then a linear regression would be fitted to the data: 


(Ху, Y) (X2, Y2) (X3, Үз) 


The fitted value at X; would constitute the smoothed value corresponding to Хз. Another 
linear regression would be fitted to the data: 


(X2, Y2) (X3, Уз) (Xa, Ya) 


Ехатр!е 


Chapter 3 Diagnostics and Remedial Measures 139 


and the fitted value at X4 would constitute the smoothed value corresponding to X;. 
Smoothed values at each end of the X range are also obtained by the lowess procedure. 

The lowess method uses a number of refinements in obtaining the final smoothed values 
to improve the smoothing and to make the procedure robust to outlying observations. 


1. The linear regression is weighted to give cases further from the middle X level in each 
neighborhood smaller weights. 

2. To make the procedure robust to outlying observations, the linear regression fitting is 
repeated, with the weights revised so that cases that had large residuals in the first fitting 
receive smaller weights in the second fitting. 

3. То improve the robustness of the procedure further, step 2 is repeated one or more 
times by revising the weights according to the size of the residuals in the latest fitting. 


To implement the lowess procedure, one must choose the size of the successive neigh- 
borhoods to be used when fitting each linear regression. One must also choose the Weight 
function that gives less weight to neighborhood cases with X values far from each center 
X level and another weight function that gives less weight to cases with large residuals. 
Finally, the number of iterations to make the procedure robust must be chosen. 

In practice, two iterations appear to be sufficient to provide robustness. Also, the weight 
functions suggested by Cleveland appear to be adequate for many circumstances. Hence, 
the primary choice to be made for a particular application is the size of tbe successive 
neighborhoods. The larger the size, the smoother the function but the greater the danger 
that tbe smoothing will lose essential features of the regression relationship. It may require 
some experimentation with different neighborhood sizes in order to find the size that best 
brings out the regression relationship. We explain the lowess method in detail in Chapter 11 
in the context of multiple regression. Specific choices of weight functions and neighborhood 
sizes are discussed there. 


Figure 3.18a contains a scatter plot based on a study of research quality at 24 research 
laboratories. The response variable is a measure of the quality of the research done at the 
laboratory, and the explanatory variable is a measure of the volume of research performed 
at the laboratory. Note that it is very difficult to tell from this scatter plot whether or not a 
relationship exists between research quality and quantity. Figure 3.18b repeats the scatter 
plot and also shows the lowess smoothed curve. The curve suggests that there might be 
somewhat higher research quality for medium-sized laboratories. However, the scatter is 
great so that this suggested relationship should be considered only as a possibility. Also, 
because any particular measures of research quality and quantity are so limited, other 
measures should be considered to see if these corroborate the relationship suggested in 
Figure 3.18b. , 


~ 


Use of Smoothed Curves to Confirm Fitted Regression Function 


Smoothed curves are useful not only in the exploratory stages;when a regression model is 
selected but they are also helpful in confirming the regression function chosen. The proce- 
dure for confirmation is simple: The smoothed curve is plotted together with the confidence 
band for the fitted regression function. If the smoothed curve falls within the confidence 
band, we have supporting evidence of the appropriateness of the fitted regression function. 


140 Part Опе Simple Linear Regression 


FIGURE 3.18 
MINITAB 
Scatter Plot 
and Lowess 
Smoothed 
Curve— 
Research 
Laboratories 
Example. 


FIGURE 3.19 
MINITAB 
Lowess Curve 


and Confidence 


Band for 
Regression 
Line—Toluca 
Company 
Example. 


Example 


(a) Scatter Plot (b) Lowess Curve 
50 
40 
ё’ 
= 30 
е; 
20 
10 
200 300 400 500 600 700 200 300 400 500 600 700 
Quantity Quantity 
(a) Scatter Plot (b) Confidence Band for Regression 
and Lowess Curve Line and Lowess Curve 
600 
500 
^, 400 
5 
Ё 
300 
200 
100 
0 50 100 150 0 40 80 120 160 


Lot Size Lot Size 


Figure 3.19a repeats the scatter plot for the Toluca Company example from Figure 1.10a 
and shows the lowess smoothed curve. It appears that the regression relation is linear or 
possibly slightly curved. Figure 3.19b repeats the confidence band for the regression line 
from Figure 2.6 and shows the lowess smoothed curve. We see that the smoothed curve falls 
within the confidence band for the regression line and thereby supports the appropriateness 
of a linear regression function. 


Comments 
1. Smoothed curves, such as the lowess curve, do not provide an analytical expression for the 
functional form of the regression relationship. They only suggest the shape of the regression curve. 
2. Thelowess procedureis notrestricted to fitting linearregression functions in each neighborhood. 
Higher-degree polynomials can also be utilized with this method. 


Chapter 3 Diagnostics and Remedial Measures 141 


3. Smoothed curves are also useful when examining residual plots to ascertain whether the resid- 
uals (or the absolute or squared residuals) follow some relationship with X or f. 

4. References 3.13 and 3.14 provide good introductions to other nonparameuic methods in re- 
gression analysis. " 


3.11 Case Example—Plutonium Measurement 


TABLE 3.10 
Basic Data— 
Plutonium 
Measurement 
Example. 


Some environmental cleanup work requires that nuclear materials, such as plutonium 238, 
be located and completely removed from a restoration site. When plutonium has become 
mixed with other materials in very small amounts, detecting its presence can be a difficult 
task. Even very small amounts can be traced, however, because plutonium emits subatomic 
particles—alpha particles—that can be detected. Devices that are used to detect plutonium 
record the intensity of alpha particle strikes in counts per second (#/sec). The regression rela- 
tionship between alpha counts per second (the response variable) and plutonium actitity (the 
explanatory variable) is then used to estimate the activity of plutonium in the material under 
study. This use of a regression relationship involves inverse prediction [1.e., predicting plu- 
tonium activity (X) from the observed alpha count (Y)], a procedure discussed in Chapter 4. 

The task here is to estimate the regression relationship between alpha counts per second 
and plutonium activity. This relationship varies for each measurement device and must be 
established precisely each time a different measurement device is used. It is reasonable to 
assume here that the level of alpha counts increases with plutonium activity, but the exact 
nature of the relationship is generally unknown. 

In a study to establish the regression relationship for a particular measurement device, 
four plutonium standards were used. These standards are aluminum/plutonium rods con- 
taining a fixed, known level of plutonium activity. The levels of plutonium activity in the 
four standards were 0.0, 5.0, 10.0, and 20.0 picocuries per gram (pCi/g). Each standard was 
exposed to the detection device from 4 to 10 times, and the rate of alpha strikes, measured 
as counts per second, was observed for each replication. A portion of the data is shown 
in Table 3.10, and the data are plotted as a scatter plot in Figure 3.20a. Notice that, as 
expected, the strike rate tends to increase with the activity level of plutonium. Notice also 
that nonzero strike rates are recorded for the standard containing no plutonium. This results 
from background radiation and indicates that a regression model with an intercept term is 
required here. 


Plutonium б Alpha Count 
Case Activity Rate 
‚(рСї/д) . — QH/set) 
1 20 ^ 450 
2 0 -004 i 
3 10. .069 
22 0 902 
23 5 049 
24 0 .106 


142 PartOne Simple Linear Regression 


FIGURE 3.20 
SAS-JMP 
Scatter Plot 
and Lowess 
Smoothed 
Curve— 
Plutonium 
Measurement 
Example. 


(a) Scatter Plot (b) Lowess Smoothed Curve 
0.15 
0.12 
о 0.09 © 
Фо Ф 
= Š 
0.06 
0.03 
.0 .00 - 
909 0 10 20 30 0 0 10 20 30 
pCi/g pCi/g 


v 


As an initial step to examine the nature of the regression relationship, a lowess smoothed 
curye was obtained; this curve is shown in Figure 3.20b. We see that the regression rela- 
tionship may be linear or slightly curvilinear in the range of the plutonium activity levels 
included in the study. We also see that one ofthe readings taken at 0.0 pCi/g (case 24) does not 
appear to fit with the rest of the observations. An examination of laboratory records revealed 
that the experimental conditions were not properly maintained for the last case, and it was 
therefore decided that case 24 should be discarded. Note, incidentally, how robust the lowess 
smoothing process was here by assigning very little weight to the outlying observation. 

A linear regression function was fitted next, based on the remaining 23 cases. The SAS- 
JMP regression output is shown in Figure 3.21a, a plot of the residuals against the fitted 
values is shown in Figure 3.21b, and a normal probability plot is shown in Figure 3.21c. 
The JMP output uses the label Model to denote the regression component of the analysis 
of variance; the label C Total stands for corrected total. We see from the regression output 
that the slope of the regression line is not zero (F* — 228.9984, P-value — .0000) so thata 
regression relationship exists. We also see from the flared, megaphone shape of the residual 
plot that the error variance appears to be increasing with the level of plutonium activity. 
The normal probability plot suggests nonnormality (heavy tails), but the nonlinearity of the 
plot is likely to be related (at least in part) to the unequal error variances. The existence of 
nonconstant variance is confirmed by the Breusch-Pagan test statistic (3.11): 


X2, = 23.29 > x?(.95; 1) = 3.84 


The presence of nonconstant variance clearly requires remediation. A number of ap- 
proaches could be followed, including the use of weighted least squares discussed in Chap- 
ter 11. Often with count data, the error variance can be stabilized through the use of a 
square root transformation of the response variable. Since this is just one in a range of 
power transformations that might be useful, we shall use the Box-Cox procedure to suggest 
an appropriate power transformation. Using the standardized variable (3.36), we find the 
maximum likelihood estimate of À to be A = .65. Because the likelihood function is fairly 
flat in the neighborhood of À — .65, the Box-Cox procedure supports the use of the square 
root transformation (1.e., use of А = .5). The results of fitting a linear regression function 
when the response variable is Y’ = 4/Y are shown in Figure 3.22a. 


Chapter З Diagnostics and Remedial Measures 143 


FIGURE 3.21 . SAS-JMP Regression Output and Diagnostic Plots for Untransformed Data—Plutonium 
Measurement Example. 


Residual 


0.04 
0.03 
0.02 
0.01 
0.00 
—0.01 
—0.02 


—0.03 
–0.04 


“0.00 0.02 005 007 0.10 012 


(а) Regression Output 


Term Estimate Std Error t Ratio — Prob-|t| 
Intercept 0.0070331 0.0036 1.95 0.0641 
Plutonium 0.005537 0.00037 15.13 0.0000 
Source DF Sum of Squares Mean Square F Ratio 
Model 1 0.03619042 0.036190 228.9984 
Error 21 0.00331880 0.000158 Prob-F 
C Total 22 0.03950922 0.0000 
Source DF Sum of Squares Mean Square F Ratio 
Lack of Fit 2 0.0001 6811 0.000084 0.5069 
Pure Error 19 0.00315069 0.000166 Prob-F 
Total Error 21 0.00331880 0.6103 L 
(b) (с) 
Residual Plot Normal Probability Plot 


[LJ CL 
Residual 


Fitted Expected 


At this point a new problem has arisen. Although the residual plot in Figure 3.226 shows 
that the error variance appears to be more stable and the points in the normal probability 
plot in Figure 3.22c fall roughly on a straight line, the residual plot now suggests that Y’ 
is nonlinearly related to X. This concern is confirmed by the lack of fit test statistic (3.25) 
(F* — 10.1364, P-value — .0010). Of course, this result is not completely unexpected, 
since Y was linearly related ta. X. 

To restore a linear relation with the transformed Y variable, we shall see if a square root 
transformation of X will lead to a satisfactory linear fit. The regression results when re- 
gressing Y' — VY on X' = УХ are presented in Figure 3.23. Notice from the residual plot 
in Figure 3.23b that the square root transformation of the predictor variable has eliminated 
the lack of fit. Also, the normal probability plot of the residuals in Figure 3.23c appears 
to be satisfactory, and the correlation test (r — .986) supports the assumption of normally 
distributed error terms (the interpolated critical value in Table B.6 for a = .05 and n = 23 
is .9555). However, the residual plot suggests that some nonconstancy of the error variance 


144 PartOne Simple Linear Regression 


FIGURE 3.22 
Variable—Plutonium Measurement Example. 


Residual 


0.07 


0.05 


0.02 


0.00 


—0.02 


—0.05 


—0.07 


SAS-JMP Regression Output and Diagnostic Plots for Transformed Response 


(a) Regression Output 


Term Estimate Std Error tRatio Prob>|t| 
Intercept 0.0947596 0.00957 9.91 0.0000 
Plutonium — 0.0133648 0.00097 13.74 0.0000 
Source DF Sum of Squares Mean Square F Ratio 
Model 1 0.21084655 0.210847 188.7960 
Error 21 0.02345271 0.001117 Prob-F 
C Total 22 0.23429926 0.0000 
Source DF Sum of Squares Mean Square F Ratio "x 
Lack of Fit 2 0.01210640 0.006053 10.1364 
Pure Error 19 0.01134631 0.000597 Prob-F 
Total Error 21 0.02345271 0.0010 
(b) = (9 
Residual Plot Normal Probability Plot 


Residual 


0.2 0.3 0.4 
Fitted Expected 


may still remain; but if so, it does not appear to be substantial. The Breusch-Pagan test statis- 
tic (3.11) is X2, = 3.85, which corresponds to a P-value of .05, supporting the conclusion 
from the residual plot that Ше nonconstancy of the error variance is not substantial. 

Figure 3.23d contains a SYSTAT plot of the confidence band (2.40) for the fitted regres- 
sion line: 


Ê’ = .0730 + .0573 X' 


We see that the regression line has been estimated fairly precisely. Also plotted in this figure 
is the lowess smoothed curve. This smoothed curve falls entirely within the confidence band, 
supporting the reasonableness of a linear regression relation between Y’ and X’. The lack of 
fit test statistic (3.25) now is F* = 1.2868 (P-value = .2992), also supporting the linearity 
of the regression relating Y' = VY to X’ = VX. 


FIGURE 3.23 SAS-JMP Regression Output and Diagnostic Plots for Transformed Response and Predictor 
Variables—Plutonium Measurement Example. 


(a) Regression Output 
Term Estimate Std Error t Ratio — Prob-|t| 
Intercept 0.0730056 0.00783 9.32 0.0000 
Sqrt Plutonium 0.0573055 0.00302 19.00 0.0000 
Source DF Sum of Squares Mean Square F Ratio 
Model 1 0.22141612 0.221416 — 360.9166 
Error 21 0.01288314 0.000613 Prob-F 
C Total 22 0.23429926 0.0000 
Source DF Sum of Squares Mean Square F Ratio 
Lack of Fit 2 0.00153683 0.000768 1.2868 
Pure Error 19 0.01134631 0.000597 Prob>F 
Total Error 21 0.01 288314 0.2992 
(b) © 2 
Residual Plot Normal Probability Plot 
0.06 
0.04 
0.02 
E Е 
З 0.00 - 
g © 
—0.02 
—0.04 
—0.06 
0.05 0.10 0,15 0.20 0.25 0.30 0,35 
Fitted Expected 


(9) 
Confidence Band for Regression 
Line and Lowess Curve 


146 PartOne Simple Linear Regression 


Cited 


References 


3.1. 


3. 


3.3, 


3.4, 


3.5, 


3.6. 


3.7. 


3.8, 


3.9, 


3.10. 


3.11. 


3.12. 


3.13. 


3.14. 


Barnett, V., and T. Lewis. Outliers in Statistical Data. 3rd ed. New York: John Wiley & Sons, 
1994. 

Looney, S. W., and T. К. Gulledge, Jr. “Use of the Correlation Coefficient with Normal 
Probability Plots," The American Statistician 39 (1985), рр. 75—79. 

Shapiro, S. S., and M. B. Wilk. “An Analysis of Variance Test for Normality (Complete 
Samples), Biometrika 52 (1965), pp. 591—611. 

Levene, H. “Robust Tests for Equality of Variances,” in Contributions to Probability and 
Statistics, ed. I. Olkin. Palo Alto, Calif.: Stanford University Press, 1960, pp. 278-92. 

Brown, M. B., and A. B. Forsythe. “Robust Tests for Equality of Variances,” Journal of the 
American Statistical Association 69 (1974), pp. 364—67. 

Breusch, Т. S., and A. К. Pagan. “A Simple Test for Heteroscedasticity and Random Coefficient 
Variation,” Econometrica 47 (1979), pp. 1287—94. 

Cook, R. D., and S. Weisberg. “Diagnostics for Heteroscedasticity in Regression,” Biometrika 
70 (1983), pp. 1—10, 

Joglekar, G., J. H. Schuenemeyer, and V. LaRiccia. “Lack-of-Fit Testing When Replicates Are 
Not Available,” The American Statistician 43 (1989), pp. 135-43. 

Box, С.Е. P, and D. R. Cox. “An Analysis of Transformations,” Journal of the Royal Statistical 
Society B 26 (1964), рр. 211-43. 

Draper, N. R., and H. Smith. Applied Regression Analysis. 3rd ed. New York: John Wiley & 
Sons, 1998. 

Velleman, P. Е, and D. C. Hoaglin. Applications, Basics, and Computing of Exploratory Data 
Analysis. Boston: Duxbury Press, 1981. 

Cleveland, W. S. “Robust Locally Weighted Regression and Smoothing Scatterplots,” Journal 
of the American Statistical Association 74 (1979), pp. 829-36. 

Altman, N. S. “An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression,” 
The American Statistician 46 (1992), pp. 175-85. 

Haerdle, W. Applied Nonparametric Regression. Cambridge: Cambridge University Press, 
1990, 


Problems 


3.1. 


3.2. 


3.3. 


Distinguish between (1) residual and semistudentized residual, (2) Е{є;) = 0 and é = 0, 

(3) error term and residual. 

Prepare a prototype residual plot for each of the following cases: (1) error variance decreases 

with X; (2) true regression function is U shaped, but a linear regression function is fitted. 

Refer to Grade point average Problem 1.19. 

a. Prepare a box plot for the ACT scores X;. Are there any noteworthy features in this plot? 

b. Prepare a dot plot of the residuals. What information does this plot provide? 

c. Plot the residual e; against the fitted values ¥;, What departures from regression model (2,1) 
can be studied from this plot? What are your findings? 

d. Prepare a normal probability plot of the residuals. Also obtain the coefficient of correlation 
between the ordered residuals and their expected values under pone Test the reason- 
ableness of the normality assumption here ins Table B.6 and о = .05. What do you 
conclude? 

е. Conduct the Brown-Forsythe test to determine whether or not the error variance varies with 
the level of X. Divide the data into the two groups, X < 26, X > 26, and use a = .01. State 
the decision rule and conclusion. Does your conclusion support your preliminary findings 
in part (c)? 


f. 


Chapter 3 Diagnostics and Remedial Measures 147 


Information is given below for each student on two variables not included in the model, 
namely, intelligence test score (X2) and high school class rank percentile (X3). (Note that 
larger class rank percentiles indicate higher standing in the class, e.g., 1% is near the bottom 
of the class and 99% is near the top of the class.) Plot the residuals against X2 and Хз on 
separate graphs to ascertain whether the model can be improved by including either of these 
variables. What do you conclude? 


i: 1 2 з  ... 118 119 120 
Хә: 122 132 119 ... 140 111 110 
Xx: 9 Л 75: Ss 97 65 85 


*3.4. Refer to Copier mamtenance Problem 1.20. 


*3.5. 


a. 


b. 


[e] 


d. 


e. 


f. 


g 


h. 


Prepare a dot plot for the number of copiers serviced X;. What information is provided by 
this plot? Are there any outlying cases with respect to this variable? = 
The cases are given in time order. Prepare a time plot for the number of copiers serviced. 
What does your plot show? 
Prepare a stem-and-leaf plot of the residuals. Are there any noteworthy features in this plot? 
Prepare residual plots of e; versus Ў, and e; versus X; on separate graphs. Do these plots 
provide the same information? What departures from regression model (2.1) can be studied 
from these plots? State your findings. 
Prepare a normal probability plot of the residuals. Also obtain the coefficient of correlation 
between the ordered residuals and their expected values under normality. Does the normality 
assumption appear to be tenable here? Use Table B.6 and œ = .10. 
Prepare a time plot of the residuals to ascertain whether the error terms are correlated over 
time. What js your conclusion? 
Assume that (3.10) is applicable and conduct the Breusch-Pagan test to determine whether 
or not the error variance varies with the level of X. Use œ = .05. State the alternatives, 
decision rule, and conclusion. 
Information is given below on two variables not included in the regression model, namely, 
mean operational age of copiers serviced on the call (X2, in months) and years of experience 
of the service person making the call (Хз). Plot the residuals against X? and Хз on separate 
` graphs to ascertain whether the model can be improved by including either or both of these 
variables. What do you conclude? 


i 1 2 3 43 44 45 
X2 20 19 27 28 26 33 
Хз 4 5 4 3 3 6 


Refer to Airfreight breakage Problem 1.21. 
a. Prepare adot plotfor the number of transfers X;. Does the distribution of number of transfers 


appear to be asymmetrical? 


b. The cases are given in time order. Prepare a time plot for the number of transfers. Is any 


systematic pattern evident in your plot? Discuss. 


c. Obtain the residuals e; and prepare a stem-and-leaf plot of the residuals. What information 


is provided by your plot? 


148 Part One Simple Linear Regression 


d. 


Plot the residuals e; against X; to ascertain whether any departures from regression 
model (2.1) are evident. What is your conclusion? 

Prepare a normal probability plot of the residuals. Also obtain the coefficient of correlation 
between the ordered residuals and their expected values under normality to ascertain whether 
the normality assumption is reasonable here. Use Table B.6 and œ = .01. What do you 
conclude? 


f. Prepare a time plot of the residuals. What information is provided by your plot? 


Assume that (3.10) is applicable and conduct the Breusch-Pagan test to determine whether 
or not the error variance varies with the level of X. Use œ = .10. State the alternatives, 
decision rule, and conclusion. Does your conclusion support your preliminary findings in 
part (d)? 


3.6. Refer to Plastic hardness Problem 1.22. "a 


a. 


b. 


Obtain the residuals e; and prepare a box plot of the residuals. What information is provided 
by your plot? 

Plot the residuals e; against the fitted values Y; to ascertain whether any departures from 
regression model (2.1) are evident. State your findings. 

Prepare a normal probability plot of the residuals. Also obtain the coefficient of correlation 
between the ordered residuals and their expected values under normality. Does the normality 
assumption appear to be reasonable here? Use Table B.6 and o — .05. 

Compare the frequencies of the residuals against the expected frequencies under normality, 
using the 25th, 50th, and 75th percentiles of the relevant t distribution. Is the information 
provided by these comparisons consistent with the findings from the normal probability plot 
in part (c)? 

Use the Brown-Forsythe test to determine whether or not the error variance varies with the 
level of X. Divide the data into the two groups, X < 24, X > 24, and use a = .05. State 
the decision rule and conclusion. Does your conclusion support your preliminary findings 
in part (b)? 


*3,7. Refer to Muscle mass Problem 1.27. 


Р 


а. 


b. 


Prepare a stem-and-leaf plot for the ages X;. Is this plotconsistent with the random selection 
of women from each 10-year age group? Explain. 


Obtain the residuals e; and prepare a dot plot of the residuals. What does your plot show? 


c. Plot the residuals e; against Y; and also against X; on separate graphs to ascertain whether 


any departures from regression model (2.1) are evident. Do the two plots provide the same 
information? State your conclusions. 

Prepare a normal probability plot of the residuals. Also obtain the coefficient of correlation 
between the ordered residuals and their expected values under normality to ascertain whether 
the normality assumption is tenable here. Use Table B.6 and œ = .10. What do you conclude? 
Assume that (3.10) is applicable and conduct the Breusch-Pagan test to determine whether 
or not the error variance varies with the level of X. Use œ = .01. State the alternatives, 
decision rule, and conclusion. Is your conclusion consistent with your preliminary findings 
in part (c)? 


3.8. Refer to Crime rate Problem 1.28. 


a. 


b. 


Prepare a stem-and-leaf plot for the percentage of individuals in the county having at least 
a high school diploma X;. What information does your plot provide? 

Obtain the residuals e; and prepare a box plot of the residuals. Does the distribution of the 
residuals appear to be symmetrical? 


3.9. 


3.10. 


3.11. 


3.12. 


Chapter 3 Diagnostics and Remedial Measures 149 


c. Make a residual plot of e; versus Ў;. What does the plot show? 

d. Prepare a normal probability plot of the residuals. Also obtain the coefficient of correlation 
between the ordered residuals and their expected values under normality. Test the reason- 
ableness of the normality assumption using Table B.6 and œ = .05. What do you conclude? 

e. Conduct the Brown-Forsythe test to determine whether or not the error variance varies with 
the level of X. Divide the data into the two groups, X < 69, X > 69, and use œ = .05. State 
the decision rule and conclusion. Does your conclusion support your preliminary findings 
in part (c)? 

Electricity consumption. An economist studying the relation between household electricity 

consumption (Y) and number of rooms in the home (X) employed linear regression model (2.1) 

and obtained the following residuals: 


i: 1 2 3 4 5 6 7 8 9 10 
X: 2 3 4 5 6 7 8 9 10 £11 
& 32 29 -1.7 -20 —23 —12 —9 8 7 E 


Plot the residuals e; against X;. What problem appears to be present here? Might a transforma- 
tion alleviate this problem? 

Per capita earnings. A sociologist employed linear regression model (2.1) to relate per capita 
earnings (Y) to average number of years of schooling (X) for 12 cities. The fitted values Ӯ; and 
the semistudentized residuals e? follow. 


i: 1 2 3 A 10 11 12 


ӯ: 9.9 9.3 10.2 Jes 15.6 11.2 13.1 
ё: -142 81 —.76 Sys —3.78 74 32 


a. Plot the semistudentized residuals against the fitted values. What does the plot suggest? 

b. How many semistudentized residuals are outside +1 standard deviation? Approximately 
how many would you expect to see if the normal error model is appropriate? 

Drug concentration. A pharmacologist employed linear regression model (2.1) to study the 

relation between the concentration of a drug in plasma (Y) and the log-dose of the drug (X). 

The residuals and log-dose levels follow. 


i: 1 2 3 4 5 6 7 8 9 
Xi: -1 0 1 —1 0 1 -1 0 1 
е: .5 2.1 —3.4 3 —1.7 4.2 —.6 2.6 —4.0 


+ 


a. Plot the residuals e; against X;. What conclusions do you draw from the plot? 

b. Assume that (3.10) is applicable and conduct the Breusch-Pagan test to determine whether 
or not the error variance varies with log-dose of the drug (X). Use œ = .05. State the 
alternatives, decision rule, and conclusion. Does your conclusion support your preliminary 
findings in part (a)? t 

A student does not understand why the sum of squares defined in (3.16) is called a pure error 

sum of squares "since the formula looks like one for an ordinary sum of squares." Explain. 


150 Part Опе Simple Linear Regression 


*3.13. 


3.14. 


3.15. 


3.16. 


*3.17. 


Refer to Copier maintenance Problem 1.20. 

a. What are the alternative conclusions when testing for lack of fit of a linear regression 
function? 

b. Perform the test indicated in part (a). Control the risk of Type I error at .05. State the decision 
rule and conclusion. 

c. Does the test in part (b) detect other departures from regression model (2.1), such as lack 
of constant variance or lack of normality in the error terms? Could the results of the test of 
lack of fit be affected by such departures? Discuss. 

Refer to Plastic hardness Problem 1.22. 

a. Perform the F test to determine whether or not there is lack of fit of a linear regression 
function; use œ = .01. State the alternatives, decision rule, and conclusion. M 

b. Is there any advantage of having an equal number of replications at each of*the X levels? Is 
there any disadvantage? 

c. Does the test in part (a) indicate what regression function is appropriate when it leads to the 
conclusion that the regression function is not linear? How would you proceed? 

Solution concentration. A chemist studied the concentration of a solution (Y) over time (X). 

Fifteen identical solutions were prepared. The 15 solutions were randomly divided into five 

sets of three, and the five sets were measured, respectively, after 1, 3, 5, 7, and 9 hours. The 

results follow. 


Ё 1 2 3 ane 13 14 15 
X: 9 9 9 m 1 1 1 
Yi: .07 .09 .08 vas 2.84 2.57 3.10 


a. Fit a linear regression function. 

b. Perform the F test to determine whether or not there is lack of fit of a linear regression 
function; use œ = .025. State the alternatives, decision rule, and conclusion. 

c. Does the test in part (b) indicate what regression function is appropriate when it leads to the 
conclusion that lack of fit of a linear regression function exists? Explain. 

Refer to Solution concentration Problem 3.15. 

a. Prepare a scatter plot of the data. What transformation of Y might you try, using the prototype 
patterns in Figure 3.15 to achieve constant variance and linearity? 

b. Use the Box-Cox procedure and standardization (3.36) to find an appropriate power 
transformation. Evaluate SSE for А = —.2, —.1, 0, .1, .2. What transformation of Y is 
suggested? 

c. Use the transformation Y' = log), Y and obtain the estimated linear regression function for 
the transformed data. 

d. Plot the estimated regression line and the transformed data. Does the regression line appear 
to be a good fit to the transformed data? 

e. Obtain the residuals and plot them against the fitted values. Also prepare anormal probability 
plot. What do your plots show? 

f. Express the estimated regression function in the original units. 

Sales growth. A marketing researcher studied annual sales of a product that had been introduced 

10 years ago. The data are as follows, where X is the year (coded) and Y is sales in thousands 


3.18. 


Chapter 3 Diagnostics and Remedial Measures 151 


of units: 
i 1 2 3 4 5 6 7 8 9 10 
Xi 0 1 2 3 4 5 6 7 8 9 


Y: 98 135 162 178 221 232 283 300 374 395 


a. Prepare a scatter plot of the data. Does a linear relation appear adequate here? 


b. Usethe Box-Cox procedure and standardization (3.36) to find an appropriate power transfor- 
mation of Y. Evaluate SSE for А = .3, .4, .5, .6, .7. What transformation of Y is suggested? 


c. Use the transformation Y' = ./Y and obtain the estimated linear regression function for the 
transformed data. 


d. Plot the estimated regression line and the transformed data. Does the regression line appear 
to be a good fit to the transformed data? 


е. Obtain the residuals and plot them against the fitted values. Also prepare a normal probability 
plot. What do your plots show? 


f. Express the estimated regression function in the original units. 

Production time. In a manufacturing study, the production times for 111 recent production 
runs were obtained. The table below lists for each run the production time in hours (Y) and the 
production lot size (X). 


[E 1 2 3 25 109 110 111 


Xr 15 9 7 s 12 9 15 
Үү: 14.28 8.80 12.49 wot 16.37 11.45 15.78 


a. Prepare a scatter plot of the data. Does a linear relation appear adequate here? Would a 
transformation on X or Y be more appropriate here? Why? 


b. Use the transformation X’ = 4/ X and obtain the estimated linear regression function for the 
transformed data. 


с. Plot the estimated regression line and the transformed data. Does the regression line appear 
to be a good fit to the transformed data? 


d. “Obtain the residuals and plot them against the fitted values. Also prepare a normal probability 
plot. What do your plots show? 


e. Express the estimated regression function in the original units. 


Exercises 


3.19. 


3.20. 


321. 
3.22. 


A student fitted a linear regression function for a class assignment. The student plotted ‘the 

residuals e; against Y; and found a positive relation. When the residuals were plotted against 

the fitted values ¥;, the student found no relation. How could this difference arise? Which is 

the more meaningful plot? 

If the error terms in a regression model are independent N (0, c°), what can be said about the 

ЭШ terms after transformation X= ' = 1/X is used? Is the situation the same after transformation 
= 1/Y is used? 

Derive the result in (3.29). t 

Using (A.70), (A.41), and (A.42), show that E{MSPE} = o? for normal error regression 

model (2.1). 


152 PartOne Simple Linear Regression 


3.23. 


A linear regression model with intercept Во = 0 is under consideration. Data have been 
obtained that contain replications. State the full and reduced models for testing the appro- 
priateness of the regression function under consideration. What are the degrees of freedom 
associated with the full and reduced models if n = 20 and c = 10? 


Projects 


3.24. 


3.25. 


3.26. 


3.27. 


3.28. 


3.29. 


Blood pressure. The following data were obtained in a study of the relation between diastolic 
blood pressure (Y) and age (X) for boys 5 to 13 years old. 


i 1 2 3 4 5 6 7 8 

Хи 5 8 11 7 13 12 12 6 E 
ad 

ү: 63 67 74 64 75 69 90 60 


а. Assuming normal error regression model (2.1) is appropriate, obtain the estimated regression 
function and plot the residuals e; against X;. What does your] residual plot show? 

b. Omitcase 7 from thedata and obtain the estimated regression function based on the remaining 
seven cases. Compare this estimated regression function to that obtained in part (a). What 
can you conclude about the effect of case 7? 

c. Using your fitted regression function in part (b), obtain a 99 percent prediction interval for 
a new Y observation at X — 12. Does observation Y; fall outside this prediction interval? 
What is the significance of this? 


Refertothe CDI data set in Appendix C.2 and Project 1.43. Foreach ofthe three fitted regression 
models, obtain the residuals and prepare a residual plot against X and a normal probability plot. 
Summarize your conclusions. Is linear regression model (2.1) more appropriate in one case than 
in the others? 

Refer to the CDI data set in Appendix C.2 and Project 1.44. For each geographic region, obtain 
the residuals and prepare a residual plot against X and a normal probability plot. Do the four 
regions appear to have similar error variances? What other conclusions do you draw from your 
plots? 

Refer to the SENIC data set in Appendix C.1 and Project 1.45. 


a. Foreach ofthe three fitted regression models, obtain the residuals and prepare a residual plot 
against X and a normal probability plot. Summarize your conclusions. Is linear regression 
model (2.1) more apt in one case than in the others? 

b. Obtain the fitted regression function for the relation between length of stay and infection 
risk after deleting cases 47 (X4; = 6.5, Үд = 19.56) and 112 (X415 = 5.9, Yin = 17.94). 
From this fitted regression function obtain separate 95 percent prediction intervals for new 
Y observations at X = 6.5 and X = 5.9, respectively. Do observations Y4; and Y? fall 
outside these prediction intervals? Discuss the significance of this. 

Refer to the SENIC data set in Appendix C.1 and Project 1.46. For each geographic region, 

obtain the residuals and prepare a residual plot against X and a normal probability plot. Do the 

four regions appear to have similar error variances? What other conclusions do you draw from 
your plots? 

Refer to Copier maintenance Problem 1.20. 

a. Divide the data into four bands according to the number of copiers serviced (X). Band 1 
ranges from X — .5 to X — 2.5; band 2 ranges from X — 2.5 to X — 4.5; and so forth. 
Determine the median value of X and the median value of Y in each of the bands and develop 


Case 
Studies 


3.30. 


331. 


3.32. 


Chapter 3 Diagnostics and Remedial Measures 153 


the band smooth by connecting the four pairs of medians by straight lines on a scatter plot 
of the data. Does the band smooth suggest that the regression relation is linear? Discuss. 

b. Obtain the 90 percent confidence band for the true regression line and plot it on the scatter 
plot prepared in part (а). Does the band smooth fall entirely inside the confidence band? 
What does this tell you about the appropriateness of the linear regression function? 

C. Create a series of six overlapping neighborhoods of width 3.0 beginning at X = .5. The 
first neighborhood will range from X — .5 to X — 3.5; the second neighborhood will range 
from X — 1.5 to X — 4.5; and so on. For each of the six overlapping neighborhoods, fit a 
linear regression function and obtain the fitted value Ê, at the center X, of the nei ghborhood. 
Develop a simplified version of the lowess smooth by connecting the six (X,, ¥,) pairs by 
straight lines on a scatter plot of the data. In what ways does your simplified lowess smooth 
differ from the band smooth obtained in part (a)? 


Refer to Sales growth Problem 3.17. 


a. Divide the range of the predictor variable (coded years) into five bands of width 2.0, as 
follows: Band 1 ranges from X = —.5to X = 1.5; band 2 ranges from X = 1.5to X = B.5; 
and so on. Determine the median value of X and the median value of Y in each band and 
develop the band smooth by connecting the five pairs of medians by straight lines on a 
scatter plot of the data. Does the band smooth suggest that the regression relation is linear? 
Discuss. 


b. Create a series of seven overlapping neighborhoods of width 3.0 beginning at X = —.5. The 
first neighborhood will range from X = —.5 to X = 2.5; the second neighborhood will range 
from X = .5 to X = 3.5; and so on. For each of the seven overlapping neighborhoods, fit a 
linear regression function and obtain the fitted value Ӯ, at the center X, of the neighborhood. 
Develop a simplified version of the lowess smooth by connecting the seven (Ж, Y;) pairs 
by straight lines on a scatter plot of the data. 


c. Obtain the 95 percent confidence band for the true regression line and plot it on the plot 
prepared in part (b). Does the simplified lowess smooth fall entirely within the confidence 
band for the regression line? What does this tell you about the appropriateness of the linear 
regression function? 


Refer to the Real estate sales data set in Appendix C.7. Obtain a random sample of 200 cases 
from the 522 cases in this data set. Using the random sample, build a regression model to 
predict sales price (Y) as a function of finished square feet ( X). The analysis should include an 
assessment of the degree to which the key regression assumptions are satisfied. If the regression 
assumptions are not met, include and justify appropriate remedial measures. Use the final model 
to predict sales price for two houses that are about to come on the market: the first has X — 1100 
finished square feet and the second has X — 4900 finished square feet. Assess the strengths 
and weaknesses of the final model. 

Refer to the Prostate cancer data set in Appendix C.5. Build a regression model to predict PSA 
level (Y) as a function of cancer yolume (X). The analysis should include an assessment of 
the degree to which the key regression assumptions are satisfied. If the regression assumptions 
are not met, include and justify appropriate remedial measures. Use the final model to estimate 
mean PSA level fora patient whose cancer volume is 20 cc. Assess the strengths and weaknesses 
of the final model. j ^ 


2 


/ 
Chapter РА 


Simultaneous Inferences 
and Other ‘Topicsin ~ 
Regression Analysis 

In this chapter, we take up a variety of topics in simple linear regression analysis. Several 


of the topics pertain to how to make simultaneous inferences from the same set of sample 
observations. 


4.1 Joint Estimation of £o and fi 


Need for Joint Estimation 


154 


A market research analyst conducted a study of the relation between level of advertising 
expenditures (X) and sales (Y). The study included six different levels of advertising ex- 
penditures, one of which was no advertising (X = О). The scatter plot suggested a linear 
relationship in the range of the advertising expenditures levels studied. The analyst now 
wishes to draw inferences with confidence coefficient .95 about both the intercept Во and the 
slope Ву. The analyst could use the methods of Chapter 2 to construct separate 95 percent 
confidence intervals for Во and В,. The difficulty is that these would not provide 95 percent 
confidence that the conclusions for both Bp and В. are correct. If the inferences were indepen- 
dent, the probability of both being correct would be (.95)?, or only .9025. The inferences are 
not, however, independent, coming as they do from the same set of sample data, which makes 
the determination of the probability of both inferences being correct much more difficult. 
Analysis of data frequently requires a series of estimates (or tests) where the analyst 
would like to have an assurance about the correctness of the entire set of estimates (or tests). 
We shall call the set of estimates (or tests) of interest the family of estimates (or tests). In our 
illustration, the family consists of two estimates, for Во and В. We then distinguish between a 
statement confidence coefficient and a family confidence coefficient. A statement confidence 
coefficient is the familiar type of confidence coefficient discussed earlier, which indicates the 
proportion of correct estimates that are obtained when repeated samples are selected and the 
specified confidence interval is calculated for each sample. A family confidence coefficient, 
on the other hand, indicates the proportion of families of estimates that are entirely correct 


Chapter 4 Simultaneous Inferences and Other Topics in Regression Analysis 155 


when repeated samples are selected and the specified confidence intervals for the entire 
family are calculated for each sample. Thus, a family confidence coefficient corresponds to 
the probability, in advance of sampling, that the entire family of statements will be correct. 

To illustrate the meaning of a family confidence coefficient further, consider again the 
joint estimation of Bp and f. A family confidence coefficient of, say, .95 would indicate here 
that if repeated samples are selected and interval estimates for both Bp and В; are calculated 
for each sample by specified procedures, 95 percent of the samples would lead to a family 
of estimates where both confidence intervals are correct. For 5 percent of the samples, either 
one or both of the interval estimates would be incorrect. 

A procedure that provides a family confidence coefficient when estimating both Bp and p, 
is often highly desirable since it permits the analyst to weave the two separate results together 
into an integrated set of conclusions, with an assurance that the entire set of estimates is 
correct. We now discuss one procedure for constructing simultaneous confidence intervals 
for Во and f, with a specified family confidence coefficient —the Bonferroni procedure, 


Bonferroni Joint Confidence Intervals 


The Bonferroni procedure for developing joint confidence intervals for fo and f, with a 
specified family confidence coefficient is very simple: each statement confidence coefficient 
is adjusted to be higher than 1 — о so that the family confidence coefficient is at least 1 — o. 
The procedure is a general one that can be applied in many cases, as we shall see, not just 
for the joint estimation of Во and f. 

We start with ordinary confidence limits for Во and f, with statement confidence coef- 
ficients 1 — о each. These limits аге: 


bo + t(1 — 0/25 n — 2)s(bo) 
b, + t(1 — o/2;n — 2)s(bi) 
We first ask what is the probability that one or both of these intervals are incorrect. Let Ау 


denote the event that the first confidence interval does not cover Во, and let A; denote the 
event that the second confidence interval does not cover Ву. We know: 


P(A)) = а Р(Аз) = а 
Probability theorem (А.б) gives the desired probability: 
P(A, U Ао) = P(Ai) + P(A2) — P (A1 N A2) 
Next, we use complementation property (A.9) to obtain the probability that both intervals 
are correct, denoted by P(A; N A2): 
P(A, ПА) = 1— P(A, U A2) —1— P(A) — Р(Аз) - P(A ПА) (4-1) 
Note from probability properties (A.9) and (A.10) that A, N Az and A, U A; are comple- 


mentary events: 


1 — P(A, U A5) = P(A, ОА) = P(A, г\ А») 


Finally, we use the fact that P(A, N А») > 0 to obtain from (4.1) the Bonferroni 
inequality: 


Р(Аүп Аз) > 1 — P(A)) – P(A2) (4.2) 


156 Part Опе Simple Linear Regression 


Example 


which for our situation is: 
P(A; ПА) >1l—-a-a=1-2a (4.2a) 


Thus, if Во and В, are separately estimated with, say, 95 percent confidence intervals, the 
Bonferroni inequality guarantees us a family confidence coefficient of at least 90 percent 
that both intervals based on the same sample are correct. 

We can easily use the Bonferroni inequality (4.2a) to obtain a family confidence coeffi- 
cient of at least 1 — о for estimating fo and В,. We do this by estimating Во and f separately 
with statement confidence coefficients of 1 — 0/2 each. This yields the Bonferroni bound 
1—0/2— 0/2 = 1—a. Thus, the 1 — о family confidence limits for Во and f, for regression 
model (2.1) by the Bonferroni procedure are: 


„ә 


bo + Bs(b] ^ bi Bs(bi) (4.3) 


where: 

B =1(1 — o/A;n — 2) (4.3a) 
and bo, bi, s{bo}, and s(b1] are defined in (1.10), (2.9), and (2.23). Note that a statement 
confidence coefficient of 1 — 0/2 requires the (1 — 0/4)100 percentile of the г distribution 
for a two-sided confidence interval. 


For the Toluca Company example, 90 percent family confidence intervals for Во and fi 
require B = t(1 — .10/4; 23) = 1(.975; 23) = 2.069. We have from Chapter 2: 


bo = 62.37 s(bo] = 26.18 
bi = 3.5702  s{bı} = 3470 


Hence, the respective confidence limits for Во and f, are 62.37 + 2.069(26.18) апа 
3.5702 + 2.069(.3470), and the joint confidence intervals are: 


8.20 < Bo < 116.5 
2.85 < fy < 4.29 


Thus, we conclude that Bp is between 8.20 and 116.5 and f, is between 2.85 and 4.29. 
The family confidence coefficient is at least .90 that the procedure leads to correct pairs of 
interval estimates. 


Comments 


1. We reiterate that the Bonferroni 1 — o family confidence coefficient is actually a lower bound 
on the true (but often unknown) family confidence coefficient. To the extent that incorrect interval 
estimates of Во and В! tend to pair up in the family, the families of statements will tend to be correct 
more than (1 — 0)100 percent of the time. Because of this conservative nature of the Bonferroni 
procedure, family confidence coefficients are frequently specified at lower levels (e.g., 90 percent) 
than when a single estimate is made. 

2. The Bonferroni inequality (4.2a) can easily be extended to g simultaneous confidence intervals 
with family confidence coefficient 1 — o: 


(e) L1— ga (4.4) 


Chapter 4 Simultaneous Inferences and Other Topics in Regression Analysis 157 


Thus, if g interval estimates are desired with family confidence coefficient 1 — œ, constructing each 
interval estimate with statement confidence coefficient 1 — a/g will suffice. 

3. For a given family confidence coefficient, the larger the number of confidence intervals in the 
family, the greater becomes the multiple B, which may make some or all of the confidence intervals 
too wide to be helpful. The Bonferroni technique is ordinarily most useful when the number of 
simultaneous estimates is not too large. 

4. It is not necessary with the Bonferroni procedure that the confidence intervals have the same 
statement confidence coefficient. Different statement confidence coefficients, depending on the impor- 
tance of each estimate, can be used. For instance, in our earlier illustration Во might be estimated with 
a 92 percent confidence interval and £j with a 98 percent confidence interval. The family confidence 
coefficient by (4.2) will still be at least 90 percent. 

5. Joint confidence intervals can be used directly for testing. To illustrate this use, an industrial 
engineer working for the Toluca Company theorized that the regression function should have an 
intercept of 30.0 and a slope of 2.50. Although 30.0 falls in the confidence interval for Bg, 2.50 does 
not fall in the confidence interval for 6,. Thus, the engineer's theoretical expectations are not correct 
at the œ = .10 family level of significance. 

6. The estimators bo and b; areusually correlated, but the Bonferroni simultaneous confidence lim- 
its in (4.3) only recognize this correlation by means of the bound on the family confidence coefficient. 

E Jt can be shown that the covariance between bo and b, is: 


o {bo, bi) = —Xo {by} (4.5) 


Note that if X is positive, by and b; are negatively correlated, implying that if the estimate b; is too 
high, the estimate Ро is likely to be too low, and vice versa. 

In the Toluca Company example, X = 70.00; hence the covariance between bo and b; is negative. 
This implies that the estimators bọ and b; here tend to err in opposite directions. We expect this intu- 
itively. Since the observed points (X;, Y;) fall in the first quadrant (see Figure 1.10a), we anticipate 
that if the slope of the fitted regression line is too steep (b; overestimates fj), the intercept is most 
likely to be too low (ро underestimates Во), and vice versa. 

When the independent variable is X; — X , as in the alternative model (1.6), bó and b, are uncor- 
related according to (4.5) because the mean of the X; — X observations is zero. ш 


4.2 Simultaneous Estimation of Mean Responses 


Often the mean responses at a number of X levels need to be estimated from the same 
sample data. The Toluca Company, for instance, needed to estimate the mean number 
of work hours for lots of 30, 65, and 100 units in its search for the optimum lot size. We 
already know how to estimate the mean response for any one level of X with given statement 
confidence coefficient. Now we shall discuss two procedures for simultaneous estimation 
of a number of different mean responses with a family confidence coefficient, so that there 
is a known assurance of all of the estimates of mean responses being correct. These are the 
Working-Hotelling and the Bonferrosii procedures. Е 

The reason why а family confidence coefficient is needed for estimating several mean 
responses even though all estimates are based on the same fitted regression line is that 
the separate interval estimates of E(Y;] at the different X, levels need not all be correct 
or all be incorrect. The combination of sampling errors in bọ and b, may be such that 


158 Part One Simple Linear Regression 


the interval estimates of E{Y;} will be correct over some range of X levels and incorrect 
elsewhere. 


Working-Hotelling Procedure 


Example 


The Working-Hotelling procedure is based on the confidence band for the regression line 
discussed in Section 2.6. The confidence band in (2.40) contains the entire regression line and 
therefore contains the mean responses at all X levels. Hence, we can use the boundary values 
of the confidence band at selected X levels as simultaneous estimates of the mean responses 
at these X levels. The family confidence coefficient for these simultaneous estimates will 
be at least 1 — о because the confidence coefficient that the entire confidence band for the 
regression line is correct is 1 — a. 

The Working-Hotelling procedure for obtaining simultaneous confideret i intervals for 
the mean responses at selected X levels is therefore simply to use the boundary values in 
(2.40) for the X levels of interest. The simultaneous confidence limits for g mean responses 
E{Y,,} for regression model (2.1) with the Working-Hotelling procedure therefore are: 


Ê, Ws(f,) (4.6) 
where: | 
W? = 2F(1— 0;2, п – 2) (4.6a) 
and Ў, and s (f',) are defined in (2.28) and (2.30), respectively. 


For the Toluca Company example, we require a family of estimates of the mean number 
of work hours at the following lot size levels: X, = 30, 65, 100. The family confidence 
coefficient is to be .90. In Chapter 2 we obtained Y, and 5{Ў„} for X, = 65 and 100. In 
similar fashion, we can obtain the needed results for lot size X, = 30. We summarize the 
results here: 


Xn f, s(f4) 


30 169.5 16.97 
65 294.4 9.918 
100 419.4 14.27 


For a family confidence coefficient of .90, we require F(.90; 2, 23) = 2.549. Hence: 
W? —2(2.549) = 5.098 W = 2.258 


We can now obtain the confidence intervals for the mean number of work hours at X, = 30, 
65, and 100: 


131.2 = 169.5 — 2.258(16.97) < E(Y,] < 169.5 + 2.258(16.97) = 207.8 
272.0 = 294.4 — 2.258(9.918) < E(Y,] < 294.4 + 2.258 (9.918) = 316.8 
387.2 = 419.4 — 2.258(14.27) < E(Y,] < 419.4 + 2.258(14.27) = 451.6 


With family confidence coefficient .90, we conclude that the mean number of work hours 
required is between 131.2 and 207.8 for lots of 30 parts, between 272.0 and 316.8 for lots 


Chapter 4 Simultaneous Inferences and Other Topics in Regression Analysis 159 


of 65 parts, and between 387.2 and 451.6 for lots of 100 parts. The family confidence 
coefficient .90 provides assurance that the procedure leads to all correct estimates in the 
family of estimates. 


Bonferroni Procedure 


Example 


The Bonferroni procedure, discussed earlier for simultaneous estimation of Bp and |, is 
a completely general procedure. 'To construct a family of confidence intervals for mean 
responses at different X levels with this procedure, we calculate in each instance the usual 
confidence limits for a single mean response E(Y;] in (2.33), adjusting the statement con- 
fidence coefficient to yield the specified family confidence coefficient. 

When E{Y,} is to be estimated for g levels X, with family confidence coefficient 1 — a, 
the Bonferroni confidence limits for regression model (2.1) are: 


Y, + Bs{¥n} „ (4.7) 


where: 


B = t(1 — a/2g;n — 2) (4.7a) 
and g is the number of confidence intervals in the family. 


For the Toluca Company example, the Bonferroni simultaneous estimates of the mean 
number of work hours for lot sizes X, = 30, 65, and 100 with family confidence coefficient 
.90 require the same data as with the Working-Hotelling procedure. In addition, we require 
B = 11 — .10/2(3); 23] = 1(.9833; 23) = 2.263. 

We thus obtain the following confidence intervals, with 90 percent family confidence 
coefficient, for the mean number of work hours for lot sizes X, = 30, 65, and 100: 


131.1 = 169.5 — 2.263(16.97) < E(Y,] < 169.5 + 2.263(16.97) = 207.9 
272.0 = 294.4 — 2.263(9.918) < E(Y,] < 294.4 + 2.263(9.918) = 316.8 
387.1 = 419.4 — 2.263(14.27) < Е{Ү,} < 419.4 + 2.263(14.27) = 451.7 


Comments 


1. In this instance the Working-Hotelling confidence limits are slightly tighter than, or the same 
as, the Bonferroni limits. In other cases where the number of statements is small, the Bonferroni 
limits may be tighter. For larger families, the Working-Hotelling confidence limits will always be 
the tighter, since W in (4.62) stays the same for any number of statements in the family whereas B 
in (4.72) becomes larger as the number of statements increases. In practice, once the family confi- 
dence coefficient has been decided upon, one can calculate the W and B multiples to determine which 
procedure leads to tighter confidence limits: 

2. Both the Working-Hotelling and Bonferroni procedures provide lower bounds to the actual 
family confidence coefficient. , 

3. The levels of the predictor variable for which the mean response is to be estimated are sometimes 
not known in advance. Instead, the levels'of interest are determined"as the analysis proceeds. This was 
the case in the Toluca Company example, where the lot size levels of interest were determined after 
analyses relating to other factors affecting the optimum lot size were completed. In such cases, it is 
better to use the Working-Hotelling procedure because the family for this procedure encompasses all 
possible levels of X. ш 


160 PartOne Simple Linear Regression 


4.3 Simultaneous Prediction Intervals for New Observations 


Example 


Now we consider the simultaneous predictions of g new observations on Y in g indepen- 
dent trials at g different levels of X. Simultaneous prediction intervals are frequently of 
interest. For instance, a company may wish to predict sales in each of its sales regions from 
a regression relation between region sales and population size in the region. 

Two procedures for making simultaneous predictions will be considered here: the Scheffé 
and Bonferroni procedures. Both utilize the same type of limits as those for predicting a 
single observation in (2.36), and only the multiple of the estimated standard deviation is 
changed. The Scheffé procedure uses the F distribution, whereas the Bonferroni procedure 
uses the т distribution. The simultaneous prediction limits for g predictions with the Scheffé 

; : . "S 
procedure with family confidence coefficient 1 — о are: 


Y, + Ss {pred} (4.8) 
where: М 
S? = gF(1—a; g,n— 2) (4.8a) 


and s {pred} is defined in (2.38). With the Bonferroni procedure, the 1 — o simultaneous 
prediction limits are: 


Y, + Bs{pred} (4.9) 
where: 
B=t(1—a/2g; n — 2) (4.9a) 
The $ and В multiples can be evaluated in advance to see which procedure provides tighter 
prediction limits. 


The Toluca Company wishes to predict the work hours required for each of the next two 
lots, which will consist of 80 and 100 units, The family confidence coefficient is to be 
95 percent. To determine which procedure will give tighter prediction limits, we obtain the 
S and B multiples: 

S? = 2F(.95; 2,23) = 2(3.422) = 6.844 5 —2.616 

В = 1[1 — .05/2(2); 23] = :(.9875; 23) = 2.398 
We see that the Bonferroni procedure will yield somewhat tighter prediction limits. The 
needed estimates, based on earlier results, are (calculations not shown): 


Xn f, s(predj В5{рғгеа} 
80 348.0 49.91 119.7 
100 419.4 50.87 122.0 


The simultaneous prediction limits for the next two lots, with family confidence coefficient 


. 95, when X, = 80 and 100 then are: 


228.3 = 348.0 — 119.7 < Үау < 348.0 + 119.7 = 467.7 
297.4 = 419.4 — 122.0 < Үк < 419.4 + 122.0 = 541.4 


Chapter 4 Simultaneous Inferences and Other Topics in Regression Analysis 161 


With family confidence coefficient at least .95, we can predict that the work hours for the 
next two production runs will be within the above pair of limits. As we noted in Chapter 2, the 
prediction limits are very wide and may not be too useful for planning worker requirements. 


Comments 


1. Simultaneous prediction intervals for g new observations on Y at g different levels of X with 
a 1— а family confidence coefficient are wider than the corresponding single prediction intervals 
of (2.36). When the number of simultaneous predictions is not large, however, the difference in the 
width is only moderate. For instance, a single 95 percent prediction interval for the Toluca Company 
example would utilize a t multiple of t (.975; 23) = 2.069, which is only moderately smaller than the 
multiple B — 2.398 for two simultaneous predictions. 

2. Note that both the B and S multiples for simultaneous predictions become larger as g increases. 
This contrasts with simultaneous estimation of mean responses where the B multiple becomes larger 
but not the W multiple. When g is large, both the B and S multiples for simultaneous predictions 
may become so large that the prediction intervals will be too wide to be useful. Other siboultaneous 
estimation techniques might then be considered, as discussed in Reference 4.1. ш 


4.4 Regression through Origin 


Model 


inferences 


Sometimes the regression function is known to be linear and to go through the origin at 
(0, 0). This may occur, for instance, when X is units of output and Y is variable cost, so Y 
is zero by definition when X is zero. Another example is where X is the number of brands 
of beer stocked in a supermarket in an experiment (including some supermarkets with no 
brands stocked) and Y is the volume of beer sales in the supermarket. 


The normal error model for these cases is the same as regression model (2.1) except that 
Bo = €: 


Y; = ВХ; + 6i (4.10) 
where: 


fi is a parameter ‘ 
X; are known constants 
є; are independent N (0, o?) 


The regression function for model (4.10) is: 
H 
Ы E(Y] = В.Х (4.11) 
which is a straight line through the origin, with Slope Ві. 


. 2 


The least squares estimator of f, in regression model (4.10) is obtained by minimizing: 


Q=) C- Вах)? (4.12) 


162 PartOne Simple Linear Regression 


with respect to В,. The resulting normal equation is: 

3 Xi - bX) = 0 (4.13) 
leading to the point estimator: 
_ Уи 
E v. 
The estimator Р, in (4.14) is also the maximum likelihood estimator for the normal error 


regression model (4.10). 
The fitted value Y; for the ith case is: 


b (4.14) 


^ "il 
Y; = b, X, (4.15) 
and the ith residual is defined, as usual, as the difference between the observed and fitted 
values: 
e; =Y; - 3; =Y; - byXi (4.16) 


An unbiased estimator of the error variance o? for regression model (4.10) is: 


xo - Ly È е? 


n—i n—i 


s? — MSE — 


(4.17) 


The reason for the denominator n — 1 is that only one degree of freedom is lost in estimating 
the single parameter in the regression function (4.11). 

Confidence limits for 1, E (Y,], and a new observation Үр (пем) for regression model (4. 10) 
are shown in Table 4.1. Note that the г multiple has n — 1 degrees of freedom here, the 
degrees of freedom associated with MSE. The results in Table 4.1 are derived in analogous 
fashion to the earlier results for regression model (2.1). Whereas for model (2.1) with an 
intercept we encounter terms (X; — X )? or (X, — X)?, here we find X? and X? because of 
the regression through the origin. 


Example _ The Charles Plumbing Supplies Company operates 12 warehouses. In an attemptto tighten 

— —— — — procedures for planning and control, a consultant studied the relation between number of 
work units performed (X) and total variable labor cost (Y) in the warehouses during a test 
period. A portion of the data is given in Table 4.2, columns 1 and 2, and the observations 
are shown as a scatter plot in Figure 4.1. 


атай Estimate of Estimated Variance Confidence Limits 

Limits for ues MSE Б 438 
Regression fi {1} У) x? 1 {by} ( ) 
through X2MSE 

Origin. E{¥p} " s?(f4) =h $, + 519} (4.19) 


` dX? 


Ynmew) s*{pred} = MSE ( + 


2 
Xi 
2 
H 


sy ) Y, + ts{pred} (4.20) 


where: t = t(1 —o/2; n— 1) 


ТАВГЕ 4.2 
Regression 
through 
Origin— 
Warehouse 
Example. 


FIGURE 4.1 
Scatter Plot 
and Fitted .’ 
Regression 
through 
Origin— 
Warehouse 
Example. 


Chapter 4 Simultaneous Inferences and Other Topics in Regression Analysis 163 


а) (2) (3) (4) (5) (6) 
Work Variable 
Units Labor Cost 
Warehouse Performed (dollars) 


i Xi Y; XiY; x? f, ё; 
1 20 114 2,280 400 :93.71 20.29 
2 196 921 180;516 38,416 918.31 2.69 
3 115 560 64,400 13,225 538.81 21.19 
10 147 ` 670 98,490 21,609 688.74  —18.74 
11 182 828 150,696 33,124. 85272  —24.72 
12 160 762 121,920 25,600 749.64 12.36 
Total 1,359 6,390 894,714 190,963 6,367.28 ү 22.72 


Variable Labor Cost 


0 50 100 150 200 
Work Units Performed 


Model (4.10) for regression through the origin was employed since Y involves variable 
costs only and the other conditions of the model appeared to be satisfied as well. From 
Table 4.2, columns 3 and 4, we have У) X;Y; = 894,714 and $^ X? = 190,963. Hence: 


_ УУХ, _ 894,714 
XX? 190,963 


and the estimated regression function is: 
H 


P. Y = 4.68527Х 


bi = 4.68527 


In Table 4.2, the fitted values are shown in column 5, the residuals in column 6. The fitted 
regression line is plotted in Figure 4.1 and it appears to be a good fit. 

An interval estimate of f; is desired with a 95 percent confidence coefficient. By squaring 
the residuals in Table 4.2, column 6, afd then summing them, we obtain (calculations not 
Shown): 

$e 24516 


? — MSE = = = 223.42 
B nc 11 2 


164 PartOne Simple Linear Regression 


From Table 4.2, column 4, we have $^ X? — 190,963. Hence: 


MSE 223.42 
Ьу} = = = .0011700 bi} = .034205 
s {И} у 7 190963 501) 
For a 95 percent confidence coefficient, we require 7 (.975; 11) = 2.201. The confidence 
limits, by (4.18) in Table 4.1, are 4.68527 + 2.201(.034205). The 95 percent confidence 
interval for В, therefore is: 


4.61 < В, < 4.76 


Thus, with 95 percent confidence, it is estimated that the mean variable labor cost increases 
by somewhere between $4.61 and $4.76 for each additional work unit performed” 


Important Cautions for Using Regression through Origin 


In using regression-through-the-origin model (4.10), the residuals must be interpreted with 
care because they do not sum to zero usually, as may be seen in Table 4.2, column 6, for 
the warehouse example. Note from the normal equation (4.13) that the only constraint on 
the residuals is of the form $^ X;e; = 0. Thus, in a residual plot the residuals will usually 
not be balanced around the zero line. 

Another important caution for regression through the origin is that the sum of the squared 
residuals SSE = $^ e? for this type of regression may exceed the total sum of squares 
SSTO = (Y; — ¥)*. This can occur when the data form a curvilinear pattern or a linear 
pattern with an intercept away from the origin. Hence, the coefficient of determination 
in (2.72), R? — 1 — SSE/SSTO, may turn out to be negative. Consequently, the coefficient 
of determination R? has no clear meaning for regression through the origin. 

Like any other statistical model, regression-through-the-origin model (4.10) needs to be 
evaluated for aptness. Even when it is known that the regression function must go through 
the origin, the function may not be linear or the variance of the error terms may not be 
constant. [n many other cases, one cannot be sure in advance that the regression line goes 
through the origin. Hence, it is generally a safe practice not to use regression-through-the- 
origin model (4.10) and instead use the intercept regression model (2.1). If the regression 
line does go through the origin, bọ with the intercept model will differ from 0 only by a 
small sampling error, and unless the sample size is very small use ofthe intercept regression 
model (2.1) has no disadvantages of any consequence. If the regression line does not go 
through the origin, use of the intercept regression model (2.1) will avoid potentially serious 
difficulties resulting from forcing the regression line through the origin when this is not 
appropriate. 


Comments 


1. In interval estimation of E[Y;] or prediction of Уе) with regression through the origin, note 
that the intervals (4.19) and (4.20) in Table 4.1 widen the further X, is from the origin. The reason 
is that the value of the true regression function is known precisely at the origin, so the effect of the 
sampling error іп the slope b; becomes increasingly important the farther X, is from the origin. 

2. Since with regression through the origin only one parameter, £,, must be estimated for regression 
function (4.11), simultaneous estimation methods are not required to make a family of statements 
about several mean responses. For a given confidence coefficient 1 — a, formula (4.19) in Table 4.1 


Chapter 4 Simultaneous Inferences and Other Topics in Regression Analysis 165 


can be used repeatedly with the given sample results for different levels of X to generate a family of 
statements for which the family confidence coefficient is still 1 — o. 

3. Some statistical packages calculate R? for regression through the origin according to (2.72) 
and hence will sometimes show a negative value for R?. Other statistical packages calculate R? using 
the total uncorrected sum of squares SSTOU in (2.54). This procedure avoids obtaining a negative 
coefficient but lacks any meaningful interpretation. 

4. 'The ANOVA tables for regression through the origin shown in the output for many statistical 
packages are based оп SSTOU = Y^ Y2, SSRU = Y: Y; = ЬУ X2, and SSE = Y XY; — b Xo}, 
where SSRU stands for the uncorrected regression sum of squares. It can be shown that these sums of 
squares are additive: SSTOU = SSRU + SSE. ш 


4.5 Effects of Measurement Errors 


0k 
In our discussion of regression models up to this point, we have not explicitly considered 
the presence of measurement errors in the observations on either the response variable Y 
or the predictor variable X. We now examine briefly the effects of measurement errors in 
_ the observations on the response and predictor variables. 


Measurement Errors in Y 

When random measurement errors are present in the observations on the response variable 
Y, no new problems are created when these errors are uncorrelated and not biased (positive 
and negative measurement errors tend to cancel out). Consider, for example, a study of 
the relation between the time required to complete a task (Y) and the complexity of the 
task (X). The time to complete the task may not be measured accurately because the person 
operating the stopwatch may not do so at the precise instants called for. As long as such 
measurement errors are of a random nature, uncorrelated, and not biased, these measurement 
errors are simply absorbed in the model error term £. The model error term always reflects 
the composite effects of a large number of factors not considered in the model, one of which 
now would be the random variation due to inaccuracy in the process of measuring Y. 


Measurement Errors in X 

Unfortunately, a different situation holds when the observations on the predictor variable 
X are subject to measurement errors. Frequently, to be sure, the observations on X are 
accurate, with no measurement errors, as When the predictor variable is the price of a product 
in different stores, the number of variables in different optimization problems, or the wage 
rate for different classes of employees. At other times, however, measurement errors may 
enter the value observed for the predictor variable, for instance, when the predictor variable 
is pressure in a tank, temperature in an oven, speed of a production line, or reported age of 
a person. 5 g 

We shall use the last illustration in our development of the nature of the problem. Suppose 
we are interested in the relation between employees’ piecework earnings and their ages. 
Let X; denote the true age of the ith employee and X7 the age reported by the employee 
on the employment record. Needless to say, the two are not always the same. We define the 


66 Part Опе Simple Linear Regression 


measurement error à; as follows: 
6; = XP — Xi (4.21) 
The regression model we would like to study is: 
Y; = Po + В.Х: + £: (4.22) 


However, we observe only X7, so we must replace the true age X; in (4.22) by the reported 
age X7, using (4.21): 


Y; = Bo + OG; — ё) + е; (4.23) 
We can now rewrite (4.23) as follows: dat 
Y; = Bo + В.Х? + (e: — Bie) (4.24) 


Model (4.24) may appear like an ordinary regression model, with predictor variable X* 
and error term £ — #8, but it is not. The predictor variable observation X7 is a random 
variable, which, as we shall see, is correlated with the error term £; — В0;. 

Intuitively, we know that ¢; — 6,4; is not independent of X7 since (4.21) constrains 
Xf — 6; to equal X;. To determine the dependence formally, let us assume the following 
simple conditions: 


Е{&} =0 (4.25а) 
E{e;} = 0 (4.25) 
E{6;¢;} = 0 (4.25c) 


Note that condition (4.252) implies that E(X7] = E(X; 4-6;] = Xi, so that in our example 
the reported ages would be unbiased estimates ofthe true ages. Condition (4.25b) is the usual 
requirementthatthe model errorterms &; have expectation О, balancing around the regression 
line. Finally, condition (4.25c) requires that the measurement error ô; not be correlated with 
the model error &;; this follows because, by (A.21a), o {6;, £j] = E6;6;] since E(5;] = 
E (i) = 0 by (4.25a) and (4.25b). 

We now wish to find the covariance between the observations X7 and the random terms 
є — Вуё in model (4.24) under the conditions in (4.25), which imply that E(X7] = X; and 
Ele; — Ёё} = 0: 


o {X}, & — Piôi} = E{(X7 — E(X; Nile: — Bidi) — Ele; — В,:)1) 
= E(X; — Xi) (e: — fà] 
= Efô: (e; — Вуб)} 
=E {&& = Bid; } 
Now Е{&&;} = 0 by (4.25c), and E(7] = 0? {5;} by (A.15a) because E {5;} = О by (4.25a). 
We therefore obtain: 
o{X}, si — Буй} = –В,028) (4.26) 


This covariance is not zero whenever there is a linear regression relation between X and Y. 
If we assume that the response Y and the random predictor variable X* follow a bivariate 
normal distribution, then the conditional distribution of the Y;, i = 1,...n, given X7, 


Chapter 4 Simultaneous Inferences and Other Topics in Regression Analysis 167 


i = 1,...n, are normal and independent, with conditional mean E(Y;|X7] = Bj + Pr X; 
and conditional variance оу у». Furthermore, it can be shown that Bf = В.[02/(02 + 07)]. 
where 02 is the variance of X and ор is the variance of Y. Hence, the least squares slope 
estimate from fitting Y on X* is not an estimate of f, but is an estimate of Br < fi. 
The resulting estimated regression coefficient of В? will be too small on average, with the 
magnitude of the bias dependent upon the relative sizes of 02 and of. If o2 is small relative 
to оў, then the bias would be small; otherwise the bias may be substantial. Discussion 
of possible approaches to estimating ff that are obtained by estimating these unknown 
variances 02 and оу will be found in specialized texts such as Reference 4.2. 

Another approach is to use additional variables that are known to be related to the true 
value of X but not to the errors of measurement ô. Such variables are called instrumental 
variables because they are used as an instrument in studying the relation between X and 
Y. Instrumental variables make it possible to obtain consistent estimators of the regression 
parameters. Again, the reader is referred to Reference 4.2. b 


Comment 

What, it may be asked, is the distinction between the case when X is a random variable, considered in 
Chapter 2, and the case when X is subject to random measurement errors, and why are there special 
problems with the latter? When X is a random variable, the observations on X are not under the 
control of the analyst and will vary at random from trial to trial, as when X is the number of persons 
entering a store in a day. If this random variable X is not subject to measurement errors, however, it 
can be accurately ascertained for a given trial. Thus, if there are no measurement errors in counting the 
number of persons entering a store in a day, the analyst has accurate information to study the relation 
between number of persons entering the store and sales, even though the levels of number of persons 
entering the store that actually occur cannot be controlled. If, on the other hand, measurement errors 
are present in the observed number of persons entering the store, a distorted picture of the relation 
between number of persons and sales will occur because the sales observations will frequently be 
matched against an incorrect number of persons. ш 


Berkson Model 


There is one situation where measurement errors in X are no problem. This case was first 
noted by Berkson (Ref. 4.3). Frequently, in an experiment the predictor variable is set at 
a target value. For instance, in an experiment on the effect of room temperature on word 
processor productivity, the temperature may be set at target levels of 68? F, 70° F, and 72° F, 
according to the temperature control on the thermostat. The observed temperature X7 is 
fixed here, whereas the actual temperature X; is a random variable since the thermostat fnay 
not be completely accurate. Similar situations exist when water pressure is set according to 
a gauge, or employees of specified ages according to their employment records are selected 
for a study. ~ 

In all of these cases, the observation X7 is a fixed quantity, whereas the unobserved true 
value X; is a random variable. The measurement error is, as before: 


$;— Х*— Xi (4.27) 


Here, however, there is no constraint on the relation between X7 and ô;, since X7 is a fixed 
quantity. Again, we assume that E(5;] = 0. 


168 Part One Simple Linear Regression 


Model (4.24), which we obtained when replacing X; by X7 — ê, is still applicable for 
the Berkson case: 


Y, = Bo + В.Х? + (в; — Bie) (4.28) 


The expected value of the error term, E (e; — #18; }, is zero as before under conditions (4.254) 
and (4.25b), since E{e;} = О and Е{8;} = 0. However, &; — 6,4; is now uncorrelated with 
X7, since X7 is a constant for the Berkson case. Hence, the following conditions of an 
ordinary regression model are met: 


1. 'The error terms have expectation zero. 
2. The predictor variable is a constant, and hence the error terms are not correlated with it, 


Thus, least squares procedures can be applied for the Berkson case without modification, 
and the estimators bo and b; will be unbiased. If we can make the standard normality and 
constant variance assumptions for the errors £; — f6;, the usual tests and interval estimates 
can be utilized. 7 


4.6 Inverse Predictions 


At times, a regression model of Y on X is used to make a prediction of the value of X which 
gave rise to a new observation Y. This is known as an inverse prediction. We illustrate 
inverse predictions by two examples: 


1. A trade association analyst has regressed the selling price of a product (Y) on its cost 
(X) for the 15 member firms of the association. The selling price Уе») for another firm 
not belonging to the trade association is known, and it is desired to estimate the cost Хк (new 
for this firm. 

2. A regression analysis of the amount of decrease in cholesterol level (Y) achieved 
with a given dosage of a new drug (X) has been conducted, based on observations for 
50 patients. A physician is treating a new patient for whom the cholesterol level should 
decrease by the amount Yj new). It is desired to estimate the appropriate dosage level Хем) 
to be administered to bring about the needed cholesterol decrease Үр (пер). 


In inverse predictions, regression model (2.1) is assumed as before: 


Y; = Bo + В.Х: + €i (4.29) 
The estimated regression function based on n observations is obtained as usual: 
Y = b +b,X (4.30) 


A new observation Yp(new) becomes available, and it is desired to estimate the level Xpnew) 
that gave rise to this new Observation. А natural point estimator is obtained by solving (4.30) 
for X, given Yiwnew): 


n Үкем) — Ё 
Xue) = L ЬЫ #0 (4.31) 
1 
where X h(ew) denotes the point estimator of the new level Хк). Figure 4.2 contains 
a representation of this point estimator for an example to be discussed shortly. It can be 


с 
Example — 


FIGURE 4.2 
Scatter Plot 
and Fitted 
Regression 
Line— 
Calibration 
Example. 


Chapter 4 Simultaneous Inferences and Other Topics in Regression Analysis 169 


shown that the estimator X h(nev) 18 the maximum likelihood estimator of X piney) for normal 
error regression model (2.1). 
Approximate 1 — o confidence limits for Xnmew) are: 


KX ew) E 1(1 — 2/2; n — 2)s{predX} (4.32) 


where: 


s?(predX] = 


e _ yy2 
М$Е + 1 (X (пем) X) | (4.322) 


b? n XX —Х)? 


A medical researcher studied а new, quick method for measuring low concentration of 
galactose (sugar) in the blood. Twelve samples were used in the study containing known 
concentrations (X), with three samples at each of four different levels. The measured 
concentration (Y) was then observed for each sample. Linear regression model (2.1) was 
fitted with the following results: 


п = 12 bo = —100 bı = 1.017 MSE = .0272 
s(bi] = .0142 Х = 5.500 Y = 5.492 УХХ, — Xy! = 135 
jl Ў = —.100 + 1.017X 


The data and the estimated regression line are plotted in Figure 4.2. 

The researcher first wished to make sure that there is a linear association between the 
two variables. A test of Ho: В = 0 versus Ha: В, 40, utilizing test statistic т* = b, /s(b1] = 
1.017/.0142 — 71.6, was conducted for a = .05. Since 1(.975; 10) = 2.228 and |r*| = 
71.6 > 2.228, it was concluded that В, 5 0, or that a linear association exists between the 
measured concentration and the actual concentration. 

The researcher now wishes to use the regression relation to ascertain the actual con- 
centration Хһпем) for a new patient for whom the quick procedure yielded a measured 
concentration of Уе) = 6.52. It is desired to estimate X444) by means of a 95 percent 


10 Е 


Y= ~.100 + 1.017Х 


Measured Galactose Concentration 


0 2 4 6 8 10 X 
Actual Galactose Concentration 


170 Part Опе Simple Linear Regression 


confidence interval. Using (4.31) and (4.322), we obtain: 


5 6.52 — (—.100) 
Xh new) == — — —— = 6.509 
к) 1.017 
.0272 1 (6.509 — 5.500)? 
?tpredX] = 22 = .0287 
s (рте) = «туту; | 12 * 135 


so that s(predX] = .1694. We require 1(.975; 10) = 2.228, and using (4.32) we obtain the 
confidence limits 6.509 + 2.228(.1694). Hence, the 95 percent confidence interval is: 

6.13 « Xh(new) < 6.89 wt 
Thus, it can be concluded with 95 percent confidence that the actual galactose concentration 
for the patient is between 6.13 and 6.89. This is approximately a +6 percent error, which 
is considered reasonable by the researcher. 


* 


Comments 


1. The inverse prediction problem is also known as a calibration problem since it is applicable 
when inexpensive, quick, and approximate measurements (Y) are related to precise, often expensive, 
and time-consuming measurements (X) based on и observations. The resulting regression model is 
then used to estimate the precise measurement Xj(ney) for а new approximate measurement Y; new). 
We illustrated this use in the calibration example. 


2. The approximate confidence interval (4.32) is appropriate if the quantity: 


{t(1 — 0/2; п — 2) MSE 


= 4.33 
ВУХ, -Xy (222) 


is small, say less than .1. For the calibration example, this quantity is: 


(2228) (0272) — 
(1.017)2(135) 


so that the approximate confidence interval is appropriate here. 

3. Simultaneous prediction intervals based on g different new observed measurements Ypnew)» 
with a 1 — « family confidence coefficient, are easily obtained by using either the Bonferroni or the 
Scheffé procedures discussed in Section 4.3. The value of t(1 — 0/2; n — 2) in (4.32) is replaced by 
either B 2£(1—0/2g8; n — 2) or S=[gF(i — a; g,n—2)]'”. 

4. The inverse prediction problem has aroused controversy among statisticians. Some statisticians 
have suggested that inverse predictions should be made in direct fashion by regressing X on Y. This 
regression is called inverse regression. ш 


4.7 Choice of X Levels 


When regression data are obtained by experiment, the levels of X at which observations 
on Y are to be taken are under the control of the experimenter. Among other things, the 


Chapter 4 Simultaneous Inferences and Other Topics in Regression Analysis 171 


experimenter will have to consider: 


1. How many levels of X should be investigated? 

2. What shall the two extreme levels be? 

3. How shall the other levels of X, if any, be spaced? 

4. How many observations should be taken at each level of X? 


There is no single answer to these questions, since different purposes of the regression 
analysis lead to different answers. The possible objectives in regression analysis are varied, 
as we have noted earlier. The main objective may be to estimate the slope of the regression 
line or, in some cases, to estimate the intercept. In many cases, the main objective is to 
predict one or more new observations or to estimate one or more mean responses. When 
the regression function is curvilinear, the main objective may be to locate the maximum or 
minimum mean response. At still other times, the main purpose is to determine the nature 
of the regression function. 

To illustrate how the purpose affects the design, consider the variances of bo, b, Ên, and 
for predicting У (кему, Which were developed earlier for regression model (2.1): 


А „1 X 
‚” с) = о E + szx-ml (4.34) 
2 
c? {by} = vui» (4.35) 
A 1 Х„—Х)? 
o?(f,] = о? E + А] (4.36) 


(4.37) 


2 2 1 (X, — Xy 
senti ce? lere уусун 
If the main purpose of the regression analysis is to estimate the slope |, the variance of Р! 
is minimized if Ў (X; — X)? is maximized. This is accomplished by using two levels of X, 
at the two extremes for the scope of the model, and placing half of the observations at each 
of the two levels. Of course, if one were not sure of the linearity of the regression function, 
one would be hesitant to use only two levels since they would provide no information about 
possible departures from linearity. If the main purpose is to estimate the intercept Во, the 
number and placement of levels does not affect the variance of bo as long as X — 0. On the 
other hand, to estimate the mean response or to predict a new observation at the level X,, 
the relevant variance is minimized by using X levels so that X = Xp. 

Although the number and spacing of X levels depends very much on the major purpose 
of the regression analysis, the general advice given by D. R. Cox is still relevant: 


Use two levels when the object is primarily to examine whether or пої... (the predictor 
variable) ...has an effect and in which direction that effect is. Use three levels whenever a 
description of the response curve by its slope and curvature is likely to be adequate; this 
should cover most cases. Use four levels if further examination of the shape of the response 
curve is important. Use more than four levels when it is required to estimate the detailed 
shape of the response curve, or when the curve is expected to rise to an asymptotic value, or 
in general to show features not adequately described by slope and curvature. Except in these 
last cases it is generally satisfactory to use equally spaced levels with equal numbers of 
observations per level (Ref. 4.4). 


M rte rir at rnt adiecta E 7 


ж ewes 


а Алан .. 


EI 
1 
Н 

i 


j 

“| 
1- 
} 

E 
i 


172 PartOne Simple Linear Regression 


Cited 4.1. Miller, R. G., Jr. Simultaneous Statistical Inference. 2nd ed. New York: Springer-Verlag, 1991. 
References 42 Fuller, W. A. Measurement Error Models. New York: John Wiley & Sons, 1987. 
4.3. Berkson, J. “Are There Two Regressions?” Journal of the American Statistical Association 45 
(1950), pp. 164-80. 
4.4. Cox, D. R. Planning of Experiments. New York: John Wiley & Sons, 1958, pp. 141-42. 


Problems 4.1. When joint confidence intervals for Bo and В; are developed by the Bonferroni method with 
a family confidence coefficient of 90 percent, does this imply that 10 percent of the time the 
confidence interval for £o will be incorrect? That 5 percent of the time the confidence interval 
for Bo will be incorrect and 5 percent of the time that for 6; will be incorrect? Discuss. 


4.2. Refer to Problem 2.1. Suppose the student combines the two confidence intetflals into a confi- 
dence set. What can you say about the family confidence coefficient for this set? 


*43. Refer to Copier maintenance Problem 1.20. 

a. Will bo and b; tend to err in the same direction or in opposite directions here? Explain. 

b. Obtain Bonferroni joint confidence intervals for Во айа £1, using a 95 percent family confi- 
dence coefficient. 

c. A consultant has suggested that By should be O and 6; should equal 14.0. Do your joint 
confidence intervals in part (b) support this view? 

*4.4. Refer to Airfreight breakage Problem 1.21. 

a. Will bo and b, tend to err in the same direction or in opposite directions here? Explain. 

b. Obtain Bonferroni joint confidence intervals for Во and 6), using a 99 percent family confi- 
dence coefficient. Interpret your confidence intervals. 

4.5. Refer to Plastic hardness Problem 1.22. 

a. Obtain Bonferroni joint confidence intervals for Во and 8, using a 90 percent family con- 
fidence coefficient. Interpret your confidence intervals. 

b. Are bo and b, positively or negatively correlated here? Is this reflected in your joint confi- 
dence intervals in part (a)? 

c. What is the meaning of the family confidence coefficient in part (a)? 

*4.6. Refer to Muscle mass Problem 1.27. 

a. Obtain Bonferroni joint confidence intervals for Во and 8, using a 99 percent family confi- 
dence coefficient. Interpret your confidence intervals. 

b. Will bo and b; tend to err in the same direction or in opposite directions here? Explain. 

с. A researcher has suggested that Во should equal approximately 160 and that 6, should be 
between —1.9 and — 1.5. Dothejoint confidence intervals in part (a) support this expectation? 

*4.7. Refer to Copier maintenance Problem 1.20. 

a. Estimate the expected number of minutes spent when there are 3, 5, and 7 copiers to be 
serviced, respectively. Use interval estimates with a 90 percent family confidence coefficient 
based on the Working-Hotelling procedure. 

b. Two service calls for preventive maintenance are scheduled in which the numbers of copiers 
to be serviced are 4 and 7, respectively. A family of prediction intervals for the times to 
be spent on these calls is desired with a 90 percent family confidence coefficient. Which 
procedure, Scheffé or Bonferroni, will provide tighter prediction limits here? 

c. Obtain the family of prediction intervals required in part (b), using the more efficient 
procedure. 


Chapter 4 Simultaneous Inferences and Other Topics in Regression Analysis 173 


*4.8. Refer to Airfreight breakage Problem 1.21. 
à. Itis desired to obtain interval estimates of the mean number of broken ampules when there 


4.9. 


*4.10. 


4.12. 


are 0, 1, and 2 transfers for a shipment, using а 95 percent family confidence coefficient. 
Obtain the desired confidence intervals, using the Working-Hotelling procedure. 


. Are the confidence intervals obtained in part (a) more efficient than Bonferroni intervals 


here? Explain. 


. The next three shipments will make 0, 1, and 2 transfers, respectively. Obtain prediction 


intervals for the number of broken ampules for each of these three shipments, using the 
Scheffé procedure and a 95 percent family confidence coefficient. 


. Would the Bonferroni procedure have been more efficient in developing the prediction 


intervals in part (c)? Explain. 


Refer to Plastic hardness Problem 1.22. 


a. 


Management wishes to obtain interval estimates of the mean hardness when the elapsed time 
is 20, 30, and 40 hours, respectively. Calculate the desired confidence intervals), using the 
Bonferroni procedure and a 90 percent family confidence coefficient. What is the meaning 
of the family confidence coefficient here? 

Is the Bonferroni procedure employed in part (a) the most efficient one that could be 
employed here? Explain. 


- The next two test items will be measured after 30 and 40 hours of elapsed time, respectively. 


Predict the hardness for each of these two items, using the most efficient procedure and a 
90 percent family confidence coefficient. 


Refer to Muscle mass Problem 1.27. 


a. 


d. 


The nutritionist is particularly interested in the mean muscle mass for women aged 45, 55, and 
65. Obtain joint confidence intervals for the means of interest using the Working-Hotelling 
procedure and a 95 percent family confidence coefficient. 


. Is the Working-Hotelling procedure the most efficient one to be employed in part (a)? 


Explain. 


. Three additional women aged 48, 59, and 74 have contacted the nutritionist. Predict the 


muscle mass for each of these three women using the Bonferroni procedure and a 95 percent 

family confidence coefficient. 

Subsequently, the nutritionist wishes to predict the muscle mass for a fourth woman aged 
"64, with a family confidence coefficient of 95 percent for the four predictions. Will the three 

prediction intervals in part (c) have to be recalculated? Would this also be true if the Scheffé 

procedure had been used in constructing the prediction intervals? | 


. A behavioral scientist said, “I am never sure whether the regression line goes through the origin. 


Hence, I will not use such a model.” Comment. 

Typographical errors. Shown below are the number of galleys for a manuscript (X) and 
the total dollar cost of correcting typographical errors (Y) in a random sample of recent orders 
handled by a firm specializing in technical manuscripts. Since Y involves variable costs only, an 
analyst wished to determine whether regression-through-the-origin model (4.10) is appropriate 
for studying the relation between the two variables. 


i: 1 2 3 4* 5 6 7 8 9 10 11 12 


X: 7 12 10 210° 14 25 30 25 18 10 4 6 
Y: 128 213 191 178 250 446 540 457 324 177 75 107 


a. Fit regression model (4.10) and state the estimated regression function. 


174 PartOne Simple Linear Regression 


4.13. 


4.14. 


4.15. 


*4.16. 


*4.17. 


4.18. 


4.19. 


b. Plottheestimated regression function and the data. Does a linear regression function through 
the origin appear to provide a good fit here? Comment, 

c. In estimating costs of handling prospective orders, management has used a standard of 
$17.50 per galley for the cost of correcting typographical errors. Test whether or not this 
standard should be revised; usea = .02. State the alternatives, decision rule, and conclusion. 

d. Obtain a prediction interval for the correction cost on a forthcoming job involving 10 galleys. 
Use a confidence coefficient of 98 percent. 

Refer to Typographical errors Problem 4.12. 

a. Obtain the residuals e;. Do they sum to zero? Plot the residuals against the fitted values Ӯ,. 
What conclusions can be drawn from your plot? 

b. Conduct a formal test for lack of fit of linear regression through the origin; use о = .01. 
State the alternatives, decision rule, and conclusion. What is the P-value of the test? 

Refer to Grade point average Problem 1.19. Assume that linear regression through the origin 

model (4.10) is appropriate. 

a. Fit regression model (4.10) and state the estimated regression function. 

b. Estimate £j with a 95 percent confidence interval. Interpret your interval estimate. 

c. Estimate the mean freshman GPA for students whose ACT test score is 30. Use a 95 percent 
confidence interval. 

Refer to Grade point average Problem 4.14. 

a. Plot the fitted regression line and the data. Does the linear regression function through the 
origin appear to be a good fit here? 

b. Obtain the residuals е;. Do they sum to zero? Plot the residuals against the fitted values f;. 
What conclusions can be drawn from your plot? 

c. Conduct a formal test for lack of fit of linear regression through the origin; use о = .005. 
State the alternatives, decision rule, and conclusion. What is the P-value of the test? 

Refer to Copier maintenance Problem 1.20. Assume that linear regression through the origin 

model (4.10) is appropriate. 

a. Obtain the estimated regression function. 

b. Estimate В with а 90 percent confidence interval. Interpret your interval estimate. 

с. Predict the service time on a new call in which six copiers are to be serviced. Use a 90 percent 
prediction interval. 

Refer to Copier maintenance Problem 4.16. 

a. Plot the fitted regression line and the data. Does the linear regression function through the 
origin appear to be a good fit here? 

b. Obtain the residuals e;. Do they sum to zero? Plot the residuals against the fitted values f. 
What conclusions can be drawn from your plot? 

c. Conduct a formal test for lack of fit of linear regression through the origin; use a = .01. 
State the alternatives, decision rule, and conclusion. What is the P-value of the test? 

Referto Plastic hardness Problem 1.22. Suppose that errors arise in X because the laboratory 

technician is instructed to measure the hardness of the ith specimen (Y;) at a prerecorded 

elapsed time ( X;), but the timing is imperfect so the true elapsed time varies at random from 
the prerecorded elapsed time. Will ordinary least squares estimates be biased here? Discuss. 

Refer to Grade point average Problem 1.19. A new student earned a grade point average of 

3.4 in the freshman year. 


420. 


Chapter 4 Simultaneous Inferences and Other Topics in Regression Analysis 175 


a. Obtain a 90 percent confidence interval for the student's ACT test score. Interpret your 
confidence interval. 


b. Is criterion (4.33) as to the appropriateness of the approximate confidence interval met here? 


Referto Plastic hardness Problem 1.22. The measurement of a new test item showed 238 Brinell 

units of hardness. 

a. Obtain a 99 percent confidence interval for the elapsed time before the hardness was mea- 
sured. Interpret your confidence interval. 

b. Is criterion (4.33) as to the appropriateness of the approximate confidence interval met here? 


Exercises 


4.21. 


4.22. 


4.23. 
4.24. 


4.25. 


When the predictor variable is so coded that X = 0 and the normal error regression model (2.1) 
applies, are bg and b; independent? Are the joint confidence intervals for Во and f, then 
independent? 

Derive an extension of the Bonferroni inequality (4.2a) for the case of three statements, each 
with statement confidence coefficient 1 — o. 

Show that for the fitted least squares regression line through the origin (4.15), $^ X;e; = 0. 
Show that ¥ as defined in (4.15) for linear regression through the origin is an unbiased estimator 
of E(Y]. 

Derive the formula for s?(?,) given in Table 4.1 for linear regression through the origin. 


Projects 


4.26. 


4.27. 


Refer to the CDI data set in Appendix С.2 and Project 1.43. Consider the regression relation 

of number of active physicians to total population. 

а. Obtain Bonferroni joint confidence intervals for Во and f, using a 95 percent family con- 
fidence coefficient. 

b. An investigator has suggested that Во should be —100 and f should be .0028. Do the joint 
confidence intervals in part (a) support this view? Discuss. 

c. It is desired to estimate the expected number of active physicians for counties with total 
population of X — 500, 1,000, 5,000 thousands with family confidence coefficient .90. 
Which procedure, the Working-Hotelling or the Bonferroni, is more efficient here? 

d. Obtain the family of interval estimates required in part (c), using the more efficient procedure. 
Interpret your confidence intervals. 

Refer to the SENIC data set in Appendix С.1 and Project 1.45. Consider the regression relation 

of average length of stay to infection risk. 

а. Obtain Bonferroni joint confidence intervals for бу and 6, using a 90 percent family con- 
fidence coefficient. 

b. A researcher suggested that fo should be approximately 7 and В; should be approximately 1. 
Do the joint intervals in part (a) support this expectation? Discuss. 

c. It is desired to estimate the expected hospital stay for persons with infection risks X = 
2, 3, 4, 5 with family confidence coefficient .95. Which procedure, the Working-Hotelling 
or the Bonferroni, is more efficient here? 

d. Obtain the family of intervalestirhates required in part (c), uSing the more efficient procedure. 
Interpret your confidence intervals, 


Matrix Approach to Simple 


Linear Regression Analysis 


Matrix algebrais widely used for mathematical and statistical analysis. The matrix approach 
is practically a necessity in multiple regression analysis, since it permits extensive systems 
of equations and large arrays of data to be denoted compactly and operated upon efficiently. 

In this chapter, we first take up a brief introduction to matrix algebra. (A more compre- 
hensive treatment of matrix algebra may be found in specialized texts such as Reference 5.1.) 
Then we apply matrix methods to the simple linear regression model discussed in previ- 
ous chapters. Although matrix algebra is not really required for simple linear regression, 
the application of matrix methods to this case will provide a useful transition to multiple 
regression, which will be taken up in Parts I and III. 

Readers familiar with matrix algebra may wish to scan the introductory parts of this 
chapter and focus upon the later parts dealing with the use of matrix methods in regression 
analysis. 


5.1 Matrices 


Definition of Matrix 


A matrix is a rectangular array of elements arranged in rows and columns. An example of 
a matrix is: 


Column Column 
1 2 


Row I 16,000 23 
Row 2 | 33,000 47 
Row 3 | 21,000 35 


The elements of this particular matrix are numbers representing income (column 1) and 
age (column 2) of three persons. The elements are arranged by row (person) and column 
` (characteristic of person). Thus, the element in the first row and first column (16,000) 
represents the income ofthe first person. The element in the first row and second column (23) 


1% represents the age of the first person. The dimension of the matrix is 3 x 2, i.e., 3 rows by 


Chapter 5 Matrix Approach to Simple Linear Regression Analysis 177 


2 columns. If we wanted to present income and age for 1,000 persons in a matrix with the 
same format as the One earlier, we would require a 1,000 x 2 matrix. 
Other examples of matrices are: 


1 O0 4 7 12 16 
5 10 3 15 9 8 
These two matrices have dimensions of 2 x 2 and 2 x 4, respectively. Note that in giving the 


dimension of a matrix, we always specify the number of rows first and then the number of 
columns. As in ordinary algebra, we may use symbols to identify the elements of a matrix: 


j=1 jz2 ј=3 
i=l ац a2 а\з 
i=2 021 022 023 


Note that the first subscript identifies the row number and the second the colunin number. 
We shall use the general notation a;; for the element in the ith row and the jth column. In 
our above example, i = 1, 2 and j = 1, 2, 3. 

A matrix may be denoted by a symbol such as A, X, or Z. The symbol is in boldface to 
identify that it refers to a matrix. Thus, we might define for the above matrix: 


A bs a2 2| 
an An 03 
Reference to the matrix A then implies reference to the 2 x 3 array just given. 
Another notation for the matrix A just given is: 
А = [aij] i=1,2;j7=1,2,3 


This notation avoids the need for writing out all elements of the matrix by stating only the 
general element. It can only be used, of course, when the elements of a matrix are symbols. 
To summarize, a matrix with r rows and c columns will be represented either in full: 


а A2 ccc (Qj -c** а 
О 202 `> Aap ccc Are 
А=| `` : d ‚ (5.1) 
Qi dij -^* dij -^* Qie 
Ari @2 ürj Orc 


or in abbreviated form: Я 
A =[a;] i=1,...,r;j=1,...,¢ 
or simply by a boldface symbol, such as A. 


. 2 


Comments 


1. Do not think of a matrix as a number. It is a set of elements arranged in an array. Only when 
the matrix has dimension 1 x 1 is there a single number in a matrix, in which case one can think of 
it interchangeably as either a matrix or a number. 


78 PartOne Simple Linear Regression 


2. The following is not a matrix: 


14 
8 
10 15 
9 16 
since the numbers are not arranged in columns and rows. [| 


Square Matrix 


A matrix is said to be square if the number of rows equals the number of columns. Two 
examples аге: 


з 9 ай 022 073 


E i а ау) Q13 
аз 032 @з3 
Vector Ы 


A matrix containing only one column is called a column vector or simply a vector. Two 
examples are: 


€i 

4 €2 

А = 7 С= €3 
10 C4 

Cs 


The vector À is a 3 x 1 matrix, and the vector C is a 5 x 1 matrix. 
A matrix containing only one row is called a row vector. Two examples are: 


B'—[15 25 50] F-(f fl 


We use the prime symbol for row vectors for reasons to be seen shortly. Note that the row 
vector B' is a 1 x 3 matrix and the row vector F' is a 1 х 2 matrix. 
A single subscript suffices to identify the elements of a vector. 


Transpose 


The transpose of a matrix À is another matrix, denoted by A', that is obtained by inter- 
changing corresponding columns and rows of the matrix À. 
For example, if: 


2 5 
= |7 10 
3х2 3 4 


then the transpose A’ is: 
pose 4-2 
n p^ E 10 d 
Note that the first column of А is the first row of A', and similarly the second column of А 
is the second row of A'. Correspondingly, the first row of A has become the first column 


Chapter 5 Matrix Approach to Simple Linear Regression Analysis 179 


of A', and so on. Note that the dimension of A, indicated under the symbol А, becomes 
reversed for the dimension of A'. 
As another example, consider: 


4 
c-|7| с-ми 7 10] 
3х1 10 1х3 


Thus, the transpose of a column vector is a row vector, and vice versa. This is the reason 
why we used the symbol B' earlier to identify a row vector, since it may be thought of as 
the transpose of a column vector B. 

In general, we have: 


Qu сз Ae 
A=]: D [al i=1,....5j=1,...,¢ (9-2) 
rxc ^ NS a ZN 
rl re 
Row Column 
index index 
а Arı 
д^ :|= fad  jeL.eooGi-h.ar (5.3) 


exr : : 

Ac сз Ae EN 
Row Column 
index index 


Thus, the element in the ith row and the jth column in A is found in the jth row and ith 
column in A'. 


Equality of Matrices 


Two matrices А and B are said to be equal if they have the same dimension and if all 
corresponding elements are equal. Conversely, if two matrices are equal, their corresponding 
elements are equal. For example, if: 


01 4 
А = |02 = |7 
3x1 аз 3х1 3 


then А — B implies: 


Similarly, if: 


3x2 3x2 
a3, 032 13 9 


180 PartOne Simple Linear Regression 


then А = B implies: 


ау = 17 02 = 2 
a2; = 14 an = 5 
азу = 13 (032 = 9 


Regression In regression analysis, One basic matrix is the vector Y, consisting of the п observations on 
Examples the response variable: 


Y, 
Y? 
Y — г (5.4) 
пх1 К 
Y, zu 
Note that the transpose Y' is the row vector: 
Y= X 0X] (5.5) 


Another basic matrix in regression analysis is the X matrix, which is defined as follows for 
simple linear regression analysis: 


1 ОХ, 

1 X; 
Х| . (5.6) 
nx2 Е E 

1 X, 


The matrix X consists of a column of 1s and a column containing the n observations on the 
predictor variable X. Note that the transpose of X is: 


уыз E ubt Age | 
xe X) | Ө?) 


The X matrix is often referred to as the design matrix. 


5.2 Matrix Addition and Subtraction 


Adding or subtracting two matrices requires that they have the same dimension. The sum, 
or difference, of two matrices is another matrix whose elements each consist of the sum, or 
difference, of the corresponding elements of the two matrices. Suppose: 


1-3 I 2 
А =|2 5 в=|2 3 
3х2 3 6 3x2 3 4 
then: 
1+1 4+2 2 6 
А+В= [2+2 54+3/=|4 8 
ane 343 644 6 10 


Regression 


Example 


a 


Chapter 5 Matrix Approach to Simple Linear Regression Analysis 181 


Similarly: 
pod ue 0 2 
А-В |2=2 5—3|=|0 2 
Ee 3—3 6—4 0 2 


In general, if: 


А = [fai] В = [5j] i=1,....7nj=1,...,¢ 
rxc rxc 


А+В= [aij + bij] and А-В= [ai — bij] (5.8) 


Formula (5.8) generalizes in an obvious way to addition and subtraction of more than two 
matrices. Note also that А + B = B + A, as in ordinary algebra. L 


The regression model: 
Y; = E{Y;} + & i=l,...,n 


can be written compactly in matrix notation. First, let us define the vector of the mean 
responses: 


E(Yij 
E(Y;) 
EY)-| . 


nxl 


(5.9) 


EQ] 


and the vector of the error terms: 


e=]. (5.10) 


Recalling the definition of the observations vector Y in (5.4), we can write the regression 
model as follows: 


Y =Е{Ү}+ € 
пх1 пх1 пх1 
because: i 
X] ГЕ) є E(Yi] + & 
Y; EtY;j £2 E{¥2} + €2 
= d ер : 
Y, E(Y,j En Е{Ү„} + En 


Thus, the observations vector Y equals the sum of two vectors, a vector containing the 
expected values and another containing the error terms. 


182 Part One Simple Linear Regression 


5.3 Matrix Multiplication 


Multiplication of a Matrix by a Scalar 
A scalar is an ordinary number or a symbol representing a number. In multiplication of a 
matrix by a scalar, every element of the matrix is multiplied by the scalar. For example, 
suppose the matrix А is given by: 


Similarly, kA equals: 


where k denotes a scalar. 
If every element of a matrix has a common factor, this factor can be taken outside the 
matrix and treated as a scalar. For example: 


lis 16] =3[5 6 


Similarly: 


| оо a] л 
alo SIN 
I 
a| = 
E. 
о л 
со № 
L 


In general, if A = [а] and k is a scalar, we have: 


Multiplication of a Matrix by a Matrix 
Multiplication of a matrix by a matrix may appear somewhat complicated at first, but a little 
practice will make it a routine operation. 
Consider the two matrices: 


2.5 4 6 
A= | | в,=[; J 


The product AB will be a 2 x 2 matrix whose elements are obtained by finding the cross 
products of rows of A with columns of B and summing the cross products. For instance, to 
find the element in the first row and the first column of the product AB, we work with the 


Chapter 5 Matrix Approach to Simple Linear Regression Analysis 183 


first row of A and the first column of B, as follows: 


A B AB 
Row 1 4| 6 Row 1 | 33 
Row2|4 1 8 
Col. 1 Col. 2 Col. 1 


We take the cross products and sum: 
2(4) + 5(5) = 33 


The number 33 is the element in the first row and first column of the matrix AB. 
To find the element in the first row and second column of AB, we work with the first row 
of A and the second column of B: 


A B AB 
Row 1 zl Е Row 1 [33 52 t. 
Row2|4 | 5 
Col. 1 Col. 2 Col. 1 Col: 2 


The sum of the cross products is: 
2(6) + 5(8) = 52 
Continuing this process, We find the product AB to be: 


2 5] [4 6] [33 52 
лв = (2 1 [s е [3 | 


Let us consider another example: 


3 
13 4 
= = 5 
A [ 5 i| B 2 


3 
134 26 
ав |0 5 i| 2 - [6 


When obtaining the product AB, we say that А is postmultiplied by B or В is premultiplied 
by A. The reason for this precise terminology is that multiplication rules forordinary algebra 
do not apply to matrix algebra. In ordinary algebra, xy = yx. In matrix algebra, AB 4 BA 
usually. In fact, even though the product AB may be defined, the product BA may not be 
defined at all. 

In general, the product AB is defined ónly when the number of columns in А equals the 
number of rows in B so that theré will be corresponding terms in the cross products. Thus, 


in our previous two examples, we had: . 
Equal p Equal А 
А/ “В = АВ А/В = АВ 
2х2 2х2 2х2 2х3 3х1 2х1 
ы, NM 7 
Dimension Dimension 


of product of product 


Se aa 


84 Part One Simple Linear Regression 


Note that the dimension of the product AB is given by the number of rows in А and the 
number of columns in B. Note also that in the second case the product BA would not be 
defined since the number of columns in B is not equal to the number of rows in А: 


Unequal 


B YN A 
3x1 2x3 


Here is another example of matrix multiplication: 
b, bn 


AB = [e Ba 2 ba Da 
21 22 023 
ba by 


6 aibi + Роу + abs, anbi + дәрә + mE 
anbi + anb +azbs3ı anb + a2 b25 + azb32 


In general, if A has dimension r x c and B has dimension c x s, the product AB is a matrix 
of dimension r x s whose element in the ith row and-jth column is: 


c ^ 
1 аР, 7 
К=1 


so that: 
c 
AB — [Ees PE S F a xs (5.12) 


Thus, in the foregoing example, the element in the first row and second column of the 
product AB is: 


3 
X anba = ay bi, + anba + aba; 


k=1 


as indeed we found by taking the cross products of the elements in the first row of A and 
second column of B and summing. 


Additional 
Examples i 4 2||a|. |4a *2a; 
А 5 8 a2 Z 5a, + 8a2 
2 
2. 23 5l3|2(243 + 52] = [38] 
5 


Here, the product is a 1 x 1 matrix, which is equivalent to a scalar. Thus, the matrix product 
here equals the number 38. 


i X| В Бо + В.Х, 
3. 1 X; ha = | o + В.Х 
1 X ! Bo + BiX3 


Chapter 5 Matrix Approach to Simple Linear Regression Analysis 185 


Regression A product frequently needed is Y'Y, where Y is the vector of observations on the response 
iable as defined in (5.4): 
Examples varia 
Exa ipm 
Y, 
YY-[Yy Y, - Y] Е = [Рук] = [У УД (5.13) 
ГУТ = 1 2 n : = 1 E 
Y, 


Note that Y'Y isa 1 x 1 matrix, or a scalar. We thus have a compact way of writing a sum 
of squared terms: Y/Y = $^ Y?. 
We also will need X'X, which is a 2 x 2 matrix, where X is defined in (5.6): 


1 Xi 
‚ 1 1 -- Jj 1 X n DX: 
х= |y, Ж е al 2% = [5x с кызы) 
1 Х„ 
and X’Y, which is a 2 x 1 matrix: 
od Yi 
А 1 poe рр X 
elon c olli ај e 
Yn 


5.4 Special Types of Matrices 


Certain special types of matrices arise regularly in regression analysis. We consider the 
most important of these. 


Symmetric Matrix 
If A = A’, A is said to be symmetric. Thus, A below is symmetric: 


14 6 14 6 
A=|4 2 5 А'=|4 2 5 ` 
383 [6.5 3 me |6 53 


A symmetric matrix necessarily is square. Symmetric matrices arise typically in regression 
analysis when we premultiply a matrix, say, X, by its transpose, X'. The resulting matrix, 
X'X, is symmetric, as can readily be seen from (5.14). 


Diagonal Matrix А 
A diagonal matrix 15 а square matrix whose off-diagonal elements are all zeros, such as: 


a 0 0 40 00 
01 00 

=|0 a 0 B= 
3x3 00a 4x4 0 0 10 0 
3 00 05 


186 Part Опе Simple Linear Regression 


We will often not show all zeros for a diagonal matrix, presenting it in the form: 


4 
ay 


аз 5 


Two important types of diagonal matrices are the identity matrix and the scalar matrix. 


Identity Matrix. The identity matrix or unit matrix is denoted by I. It is a diagonal matrix 
whose elements on the main diagonal are all 1s. Premultiplying or postmul&flying any r xr 
matrix A by the r x r identity matrix I leaves A unchanged. For example: 


100 а а 013 а ау) 43 
IA=|0 1 0 ай 22 a3 = |41 422 a3 
0 0 1] [а аз аз азу 032 033 
Similarly, we have: 
a, a2 аз||1 0 0 а dij аз 
АТ = айу 022 aw 0 1 0|= 021 022 a 
аз az az3j|0 0 1 аз a32 азз 


Note that the identity matrix I therefore corresponds to the number 1 in ordinary algebra, 
since we have there that 1 -x =x-l=x. 
In general, we have for any r x r matrix А: 


AI=IA=A (5.16) 


Thus, the identity matrix can be inserted or dropped from a matrix expression whenever it 
is convenient to do so. 


Scalar Matrix. A scalar matrix is a diagonal matrix whose main-diagonal elements are 
the same. Two examples of scalar matrices are: 


2 0 коо 
02 око 
0 0 k 


A scalar matrix can be expressed as KL, where k is the scalar. For instance: 


ШЕЕ 


k 0 0 100 
0k Oj =k|O 1 0| =kI 
0 0 k 0 0 1 


Multiplying an r x r matrix A by the r x r scalar matrix kI is equivalent to multiplying 
A by the scalar k. 


Chapter5 Matrix Approach to Simple Linear Regression Analysis 187 


Vector and Matrix with All Elements Unity 


A column vector with all elements 1 will be denoted by 1: 


1 
1 

1 = * (5.17) 
rxl E 
1 


and a square matrix with all elements 1 will be denoted by J: 


] -- 1 
je i (5.18) 
rxr 1 1 
L 
For instance, we have: 
1 1 1 
a 1 =|1 J2|1 1 1 
$ 3x1 1 3х3 1 1 1 
Note that for an n x 1 vector 1 we obtain: 
1 
11-—(1 --- 1]j:| -2[n]9n 
1х1 
1 
and: 
1 1 1 
11'= [1 П= |: = 
1 1 э 1 nxn 
Zero Vector 
A zero vector is a vector containing only zeros. The zero column vector will be denoted 
by 0: : 
0 
/ 0 
0 = |. (5.19) 
~ rxl 5 
0 
For example, we have: А 2 
i 0 
= |0 
3х1 0 


188 Part One Simple Linear Regression 


9.5 Linear Dependence and Rank of Matrix 


Linear Dependence 
Consider the following matrix: 


12 51 
А= |2 2 10 6 
3 4 15 1 


Let us think now of the columns of this matrix as vectors. Thus, we view A as being made 
up of four column vectors. It happens here that the columns are interrelated in a special 
manner. Note that the third column vector is a multiple of the first column vector?” 


= 


We say that the columns of A are linearly dependent. They contain redundant information, 
so to speak, since one column can be obtained as a linear combination of the others. 

We define the set of c column vectors C,,..., С, in an r x c matrix to be linearly 
dependent if one vector can be expressed as a linear combination of the others. If no vector 
in the set can be so expressed, we define the set of vectors to be linearly independent. A 
more general, though equivalent, definition is: 


When c scalars k,, ..., ke, not all zero, can be found such that: 
КС, +С) + --- + Е.С, = 0 
where 0 denotes the zero column vector, the c column vectors are linearly (5.20) 


dependent. ïf the only set of scalars for which the equality holds is 
kı — 0, ..., К, = 0, the set of c column vectors is linearly independent. 


To illustrate for our example, К = 5, k2 = 0, Кз = —1, k4 = 0 leads to: 


1 2 5 1 0 
5|2|+0|2]—1|10|+0]6]|=|0 
3 4 15 1 0 


` Hence, the column vectors are linearly dependent. Note that some of the К; equal zero here. 
For linear dependence, it is only required that not all k; be zero. 


Rank of Matrix 


The rank of a matrix is defined to be the maximum number of linearly independent columns 
in the matrix. We know that the rank of A in our earlier example cannot be 4, since the four 
columns are linearly dependent. We can, however, find three columns (1, 2, and 4) which 
are linearly independent. There are no scalars Ку, К, Ка such that kC, + KC; +k,C, = 0 
other than kj = kz = k4 = 0. Thus, the rank of A in our example is 3. 

The rank of a matrix is unique and can equivalently be defined as the maximum number 
of linearly independent rows. It follows that the rank of an r x c matrix cannot exceed 
min(r, c), the minimum of the two values r and c. 


Chapter 5 Matrix Approach to Simple Linear Regression Analysis 189 


When a matrix is the product of two matrices, its rank cannot exceed the smaller of the 
two ranks for the matrices being multiplied. Thus, if C — AB, the rank of € cannot exceed 
min(rank A, rank B). 


5.6 Inverse of a Matrix 


Examples 


1 
In ordinary algebra, the inverse of a number is its reciprocal. Thus, the inverse of 6 is =. A 


number multiplied by its inverse always equals 1: 6 
i 1 
6-~=—-6=1 
6 6 
Бу x 2x! xzl 
x 


In matrix algebra, the inverse of a matrix A is another matrix, denoted by A“, such thet: 

A'A = AA! =I (5.21) 

where I is the identity matrix. Thus, again, the identity matrix I plays the same role as the 

number 1 in ordinary algebra. An inverse of a matrix is defined only for square matrices. 

Even so, many square matrices do not have inverses. If a square matrix does have an inverse, 
the inverse is unique. 


1. The inverse of the matrix: 


Since: 


or: 


3 0 0 
A= 0,4 0 
^ 002 
18: , 
1 
Z 0 0 * 
3, 
2 1 
ee 4 s 
1 
0 - 
9 2. 


190 


Part One Simple Linear Regression 


since: 

1 

2-4 

Ёё 300 100 

АТА= {0 = 0110 4 0|2/|0 1 Oj =I 

4 002 001 
0 = 

0:5 


Note that the inverse ОЁ a diagonal matrix is a diagonal matrix consisting simply of the 
reciprocals of the elements on the diagonal. 


Finding the Inverse 


Up to this point, the inverse of a matrix А has been given, and we have only checked to 
make sure it is the inverse by seeing whether or not A^! A = I. But how does one find the 
inverse, and when does it exist? 

An inverse of a square r x r matrix exists if the rank of the matrix is r. Such a matrix is 
said to be nonsingular or of full rank. An r x r matrix with rank less than r is said to be 
singular or not of full rank, and does not have an inverse. The inverse of an r x r matrix of 
full rank also has rank r. 

Finding the inverse of a matrix can often require a large amount of computing. We shall 
take the approach in this book that the inverse of a 2 x 2 matrix and a 3 x 3 matrix can 
be calculated by hand. For any larger matrix, one ordinarily uses a computer to find the 
inverse, unless the matrix is of a special form such as a diagonal matrix. It can be shown 
that the inverses for 2 x 2 and 3 x 3 matrices are as follows: 


1. If 
a b 
A= || 4 | 
Шеп: 
d —Ь 
| — — 
zi a b _|D D 
T > b | 41е a (5.22) 
D D 
where: 
D — ad — bc (5.22a) 


D iscalled the determinant ofthe matrix A. If A were singular, its determinant would equal 
Zero and no inverse of А would exist. 
2. I£ 


then: 


a b c A B C 
B'=|d e f| =|D Е Е (5.23) 
g h k G HK 


Regression 
Example 


Chapter 5 Matrix Approach to Simple Linear Regression Analysis 191 


Where: 
А = (ek — fh)/Z B=-—(bk—ch)/Z C= (bf —ce)/Z 
D = —(dk — fg)/Z E=(ak—cg)/Z F--—(af—cd)/Z (5.23a) 
G = (dh — eg)/Z H = —(ah —bg)/Z K = (ae — bd)/Z 

and: 


Z = a(ek — fh) — b(dk — fg)  c(dh — eg) (5.23b) 
Z is called the determinant of the matrix B. 


Let us use (5.22) to find the inverse of: 


We have: 
а=? b=4 
М c=3 dzl 
D = ad — be = 2(1) – 4(3) = —10 
Непсе: 
1 —4 
fits —10 —10 _ —.1 4 
-3 2 3 -2 
—10 —10 


as was given in an earlier example. 

When an inverse A^! has been obtained by hand calculations orfrom a computerprogram 
for which the accuracy of inverting a matrix is not known, it may be wise to compute 
A^!A to check whether the product equals the identity matrix, allowing for minor rounding 
departures from 0 and 1. 


The principal inverse matrix encountered in regression analysis is the inverse of the matrix 


X'X in (5.14): 
i 
ХХ = ý 2 
22 [EX УХ 


a=n b=0 X; " 
с= УХ A=} X; 3 


Using rule (5.22), we have: 


So that: 


р-у - (x) (x) = [yox - 39^] луук, - xr 


192 PartOne Simple Linear Regression 


Hence: 
dX} -5 X; 
xy’ |" УХХ, -Xy nYXQX; —Xy 
2x2 – У) Xi п 


ny Xi- Х)2 пух, – Xy 
Since УУ X; = nX and УХХ; — X} = У X? — nX?, we can simplify (5.24): 


1 P x? =X 
" n EX- YX- Xy 
X 1 ү n » 3 
ex) —X 1 й 


р О) 
Uses of Inverse Matrix 
In ordinary algebra, we solve an equation of the type: 


by multiplying both sides of the equation by the inverse of 5, namely: 
1 1 
—(5у) = – (20 
59») = 500) 


and we obtain the solution: 


1 
y- 520 =4 
In matrix algebra, if we have an equation: 
AY =C 
We correspondingly premultiply both sides by A~', assuming A has an invers 
A AY -A'C 
Since A"! AY = IY = Y, we obtain the solution: 
Y-A'!C 


To illustrate this use, suppose we have two simultaneous equations: 
2у + Ay; = 20 
Зу + y = 10 
which can be written as follows in matrix notation: 
5 t| Ds] 7 [o] 
3 l||y»| |10 


The solution of these equations then is: 


|] a] [ш 


Chapter 5 Matrix Approach to Simple Linear Regression Analysis 193 


Earlier we found the required inverse, so we obtain:, 


Е liner 


Hence, y; = 2 and y; = 4 satisfy these two equations. 


5.7 Some Basic Results for Matrices 


We list here, without proof, some basic results for matrices which we will utilize in later 


work. 
А+В=В+А (5.25) 
(A+B) -C — Ac (B4 С E (5.26) 
(AB)C = A(BC) (5.27) 
C(A + B) = СА + СВ (5.28) 
2 k(A + B) = КА + КВ (5.29) 
(A)! = А (5.30) 
(А + B) = А+ B/ (5.31) 
(АВ) = ВА’ (5.32) 
(АВС) = C'B'A' (5.33) 
(АВ)! = В-!А^! (5.34) 
(АВС)! = С-!В-'А-! (5.35) 
(A! = А (5.36) 
(A)! = (A7! (5.37) 


5.8 Random Vectors and Matrices 


A random vector or a random matrix contains elements that are random variables. Thus, 
the observations vector Y in (5.4) is a random vector since the Y; elements are random 
variables. ` 


Expectation of Random Vector or Matrix 


Suppose we have n = 3 observations in the observations vector Y: 


Yi 
Y=|% 
3x1 Y; Ж 


The expected value of Y is a vector, denoted by E{Y}, that is defined as follows: 


EY] 
E(Y] = | E{¥} 
3xl E{Y3} 


194 Part One Simple Linear Regression 


Thus, the expected value of a random vector is a vector whose elements are the expected 
values of the random variables that are the elements of the random vector. Similarly, the 
expectation of a random matrix is a matrix whose elements are the expected values of the 
corresponding random variables in the original matrix. We encountered a vector of expected 
values earlier in (5.9). 
In general, for a random vector Y the expectation is: 
E(Y] = CE(Y;]] i=1,...,n (5.38) 


пх] 


and for a random matrix Y with dimension n x p, the expectation 1s: 


E(Y)-[E(Yy] i-2L..,mjoL...p ^ (5.39) 


nxp 
Regression Suppose the number of cases in a regression application is п = 3. The three error terms £j, 
Example £2, Ез each have expectation zero. For the error terms vector: 
€ | 
€ = | & 
3х1 
£5 
we have: 
Efe} = 0 
3x1 3x1 
Since: 
E(ei] 0 
E(e;] = |0 
Е {ез} 0 


Variance-Covariance Matrix of Random Vector 
Consider again the random vector Y consisting of three observations Y, , У, Уз. The variances 
of the three random variables, o?(Y;], and the covariances between any two of the random 
variables, o (Y;, Y;}, are assembled in the variance-covariance matrix of Y, denoted by 
o?(Y], in the following form: 


c?(Yi c(Y,, Y] of{%, Уз) 
ec'(Yj2|c(X,Y) 040} o, Y} (5.40) 
o{Y3, Yı} of{¥3, Yo}  o?(Y4 


Note that the variances are On the main diagonal, and the covariance o (Y;, Y;} is found 
in the ith row and jth column of the matrix. Thus, o (Y2, Y,} is found in the second row, 
first column, and c (Y;, У} is found in the first row, second column. Remember, of course, 
that o (У, Yı} = c(Y;, Yo}. Since o {Y;, Yj} = o(Y;, Y;] for alli Æ j, o? (Y] is a symmetric 
matrix. 

It follows readily that: 


o^(Yj = E((Y — E(Y]]I(Y — Е(Ү}) (5.41) 


Regression 
Example 


Chapter 5 Matrix Approach to Simple Linear Regression Analysis 195 


For our illustration, we have: 


Y, — E(Yi] 
o{Y}=E¢ | Y — E(Y] | [Yi — ЕУ) Y E(Y)) Ys — EYY 
Үз — E(Y3j 


Multiplying the two matrices and then taking expectations, we obtain: 


Location in Product Term Expected Value 
Row 1, column 1 (Ү, — Е{Ү, р? c^(Y1) 
Row 1, column 2 (Үз — EMPOY — EIY2)) c (Yi, Y2} 
Row 1, column 3 (Уз — E(Y3X(Y5 — E{¥3}) o {Y1, Үз) 
Row 2, column 1 (Y2 — E(Y2(*, — EDP o{Y2, Yı} 
etc. etc. etc. Ł 


This, of course, leads to the variance-covariance matrix in (5.40). Remember the definitions 
of variance and covariance in (А.15) and (A.21), respectively, when taking expectations. 
To generalize, the variance-covariance matrix for an n x 1 random vector Y is: 


c?(Yi) c (Yi, Y; TT о{У,, Y] 
c(Y4,Y) 0400} --- o(YiY, 
o^(Y] = | | (5.42) 
o (Y,, Yi} о{Ү,, Yo} 313 c?(Y,] 


Note again that o?(Y] is a symmetric matrix. 


Let us return to the example based on п — 3 cases. Suppose that the three error terms have 
constant variance, o?(s;] = o?, and are uncorrelated so that o{¢;, £j} = О fori # j. The 
variance-covariance matrix for the random vector є of the previous example is therefore as 
follows: 


c? 0 0 
o{e}=]0 c? 0 
3x3- 0 0 о? 


Note that all variances are 0? and all covariances are zero. Note also that this variance- 
covariance matrix is a scalar matrix, with the common variance o? the scalar. Hence, we 
can express the variance-covariance matrix in the following simple fashion: 


— „23 

^ - $4 
since: li x 
100 oc? 0 0 
oIl=o7/0 1 0|21|0 oc? 0 


196 PartOne Simple Linear Regression 


Some Basic Results 


Frequently, we shall encounter a random vector W that is obtained by premultiplying the 
random vector Y by a constant matrix À (a matrix whose elements are fixed): 


W = АҮ (5.43) 
Some basic results for this case are: 
Е{А} = A (5.44) 
E(W] = E{AY} = AE(Y] a (5.45) 
o {W} = {AY} = Ao?(Y]A' (5.46) 


where o?(Y] is the variance-covariance matrix of Y. 


ce 


As a simple illustration of the use of these results, consider: 


ГА 1 1] TY, Y, — Y 
„| |1 1||5| |у+ 
Ww A 


Y 
2х1 2х2 2х1 


Example 


We then have by (5.45): 


| P ex | 
E{W} = = 

2х1 1 1||Е{У» E(Yi] + E(Y;] 
and by (5.46): 


, | H | c?(Y;] ea | 1 | 
о {W} = 
2х2 1 1] |of%2,¥%} ОҢУ, —1 1 
К ae +0200} — 2o{¥i, Yo} o° {Yı} — o7{¥o} 
i 02001} — 000} o?(Yi] + o?(Y] + 20(Y,, Yo} 
Thus: 
o7 {Wi} = о — Y] = o?(Y] + o?(Y2) — 2о{У\, Yo} 
o {Wo} = о + Yo} = 07 {Y1} + 07 {Yo} + 20 {Y;, Y2} 
o{W,, W2} = of Y, — Y, Y, + Yo} 2 o?(Y] — o?(Y;] 
Multivariate Normal Distribution 


Density Function. The density function for the multivariate normal distribution is best 
given in matrix form. We first need to define some vectors and matrices. The observations 


Chapter 5 Matrix Approach to Simple Linear Regression Analysis 197 


vector Y containing an observation on each of the p Y variables is defined as usual: 


Y, 

Y, 
Y=]. (5.47) 
px! с 

Ү 


The mean vector E{Y}, denoted by p, contains the expected values for each of the p Y 
variables: 


p 
H2 
и = |. (5.48) 
рхі H 
Hp Ь 


Finally, the variance-covariance matrix o?(Y] is denoted by X and contains as always the 
variances and covariances of the p Y variables: 


2 
о eu Us ор 
02 05 “эсе O2p 
x-2|. : | (5.49) 
рхр : E : 
С, О, n o? 
p! p2 p 


Here, o? denotes the variance of Y;, oj? denotes the covariance of Y, and Y2, and the like. 
The density function ofthe multivariate normal distribution can now be stated as follows: 


1 
fOO = туту? 


exp 0 а) (5.50) 
Here, || is the determinant of the variance-covariance matrix Ў. When there are р = 2 
variables, the multivariate normal density function (5.50) simplifies to the bivariate normal 
density function (2.74). 

The multivariate normal density function has properties that correspond to the ones de- 
scribed for the bivariate normal distribution. For instance, if Y,, . . . , Y, are jointly normally 
distributed (i.e., they follow the multivariate normal distribution), the marginal probability 
distribution of each variable Y, is normal, with mean pj, and standard deviation оу. 


+ 


Simple Linear Regression Model in Matrix Terms 


We are now ready to develop simple linear regression in matrix terms. Remember again that 
we will not present any new results, but shall only state in matrix terms the results obtained 
earlier. We begin with the normal error regression model (2.1): 


У = В+ ВХ + isl, on (5.51) 


98 Part Опе Simple Linear Regression 


This implies: 
Y, = Bo + ÊiXı + 


Y; = Bo + £1X2 + € (5-51a) 


Y, = В+ В.Х, T£, 


We defined earlier the observations vector Y in (5.4), the X matrix in (5.6), and the € vector 
in (5.10). Let us repeat these definitions and also define the B vector of the regression 


coefficients: 
Y, 1 Xi 
Y; 1 Xa p 
0 
= = = = 5.52 
nxl x nx2 н 2 2х1 Н пхі ( ) 
Y, 1 X; E 
Now we can write (5.5 1а) in matrix terms compactly as follows: 
Ү=Х Bre e (5.53) 
пх1 nx2 2х] "xl 
Since: 
Yi 1 Xj € 
Y; 1 X» Bo E2 
: = З + |. 
: ee || | : 
Y, 1 X, En 
Bo + В.Х. £i Bo + В.Х + ё 
Во + ВХ £2 Po + Bi X2 + & 
= : ue p : 
Po + В.Х, En Po + В.Х, + En 
Note that Xf is the vector of the expected values of the Y; observations since E(Y;] = 
Po + В, Х;; hence: 
E(Y] = ХВ (5.54) 


nxl nxl 


where E{Y} is defined in (5.9). 
The column of 1s in the X matrix may be viewed as consisting of the constant Xp = 1 
in the alternative regression model (1.5): | 


Ү; = PoXo + В, Х; + &; where Xo =] 


Thus, the X matrix may be considered to contain a column vector consisting of 1s and 
another column vector consisting of the predictor variable observations X;. 

_ With respect to the error terms, regression model (2.1) assumes that E (s;] = 0, о2{8;} = 
о?, and that Ше є; are independent normal random variables. The condition E{e;} =0 in 


Chapter 5 Matrix Approach to Simple Linear Regression Analysis 199 


matrix terms is: 


Efe} = 0 (5.55) 
nxl nx! 
since (5.55) states: 
Efe} 0 
Efe} 0 
E(e] 0 


The condition that the error terms have constant variance o? and that all covariances 
o (&;, €j} for i Æ j are zero (since the ғ; are independent) is expressed in matrix terms 
through the variance-covariance matrix of the error terms: 


o 00... 0 t 
0o20- 0 
=|. .. ; (5.56) 
k 0. 0 о... в? 


Since this is a scalar matrix, we know from the earlier example that it can be expressed in 
the following simple fashion: 


o? (e) = о? (5.56a) 


nxn nxn 


Thus, the normal error regression model (2.1) in matrix terms is: 
Y=Xf+e (5.57) 
where: 


Е is a vector of independent normal random variables with E{e} = 0 and 
o*{e} = c?I 


5.10 Least Squares Estimation of Regression Parameters 


Normal Equations 


The normal equations (1.9): 
bo +b, УХ; = DY; 
nDo + 1 A: А У; (5.58) 
boo Xi by у X? = У XY; 
in matrix terms are: 
XX p XY (5.59) 


„2х2 2x1 2x1 a 


where b is the vector of the least squares regression coefficients: 


b= М (5.59а) 


200 Part Опе Simple Linear Regression 


To see this, recall that we obtained X'X in (5.14) and X'Y in (5.15). Equation (5.59) thus 


States: 
n Ух; | | Bo _ 20x 
XX | ME ca 
= "„+иўїх, ] [X 
ale S 


These are precisely the normal equations in (5.58). 


ot 


Estimated Regression Coefficients 
To obtain the estimated regression coefficients from the normal equations (5.59) by matrix 
methods, we premultiply both sides by the inverse of X/X (we assume this exists): 


(X'X)'X'Xb = (ХХ) !X'Y 
We then find, since (X'X)^! X'X = I and Ib = b: 


b —(QX)'xy (5.60) 
2x1 2x2 2x1 
The estimators Ро and b; in b are the same as those given earlier in (1.102) and (1.10b). We 
shall demonstrate this by an example. 


We shall use matrix methods to obtain the estimated regression coefficients for the Toluca 
Company example. The data on the Y and X variables were given in Table 1.1. Using these 
data, we define the Y observations vector and the X matrix as follows: 


Example 


399 1 80 
121 

(5.61a) Y-|. (5.61b) Х= |. . (5.61) 
323 1 70 


We now require the following matrix products: 


І 80 
" І 30 

xx= |a 2 E] Do: = | (5.62) 
| 70 
399 

m E 20 E] ve Fe (5.63) 


Chapter 5 Matrix Approach to Simple Línear Regression Analysis 201 


Using (5.22), we find the inverse of X'X: 


(5.64) 


a [ -287475 —.003535 
каке о 00005051 


In subsequent matrix calculations utilizing this inverse matrix and other matrix results, we 
shall actually utilize more digits for the matrix elements than are shown. 
Finally, we employ (5.60) to obtain: 


[b] vive _ [ 287475. —.003535 7,807 
= [р | ON | —.003535 ^ .00005051| | 617,180 
62.37 
i | о (09) 
or bo = 62.37 and b, = 3.5702. These results agree with the ones in Chapter 1. Avy differ- 
ences would have been due to rounding effects. 


Comments 
1. To derive the normal equations by the method of least squares, we minimize the quantity: 
Q — Л — (Po + AX? 
In matrix notation: 
Q = (Y - XP) c — XB) (5.66) 
Expanding, we obtain: 
Q-—YY-fxY – ҮХВ + Вх'ХВ 


since (XBY = В'Х' by (5.32). Note now that Y'Xf is 1 х 1, hence is equal to its transpose, which 
according to (5.33) is В X'Y. Thus, we find: 


Q — ҮҮ -2 XY + 8 'x'xg (5.67) 
To find the value of В that minimizes О, we differentiate with respect to Во and £i. Let: 

8Q 
д д 

— = 5.68 
88i 

Then it follows that: + 
9 
38 (О) = —2Х'Ү +2Х'ХВ (5.69) 


Equating to the zero vector, dividing by 2, ара substituting b for gives the matrix form of the least 
squares normal equations in (5.59). 

2. A comparison of the normal equations and X'X shows that whenever the columns of Х'Х are 
linearly dependent, the normal equations will be linearly dependent also. No unique solutions can 
then be obtained for bp and b;. Fortunately, in most regression applications, the columns of X'X are 
linearly independent, leading to unique solutions for Ро and b,. ш 


die И ЕКЕНИНЕ 


a 
3 
1 
і 


202 PartOne Simple Linear Regression 


5.11 Fitted Values and Residuals 


Fitted Values 


Example 


Let the vector of the fitted values Ў, be denoted by ME 
f, 
„ |% 
== | Я (5.70) 
nx] 5 
Ў; 
In matrix notation, we then have: dem 
=x (5.71) 
пх] nx2 2х1 
because: 
Ӯ, 1 Xi by +b, Xi 
Y; 1 X3 bo ў bo + b, X; 
Ў, 1 X, bo T bi X, 


For the Toluca Company example, we obtain the vector of fitted values using the matrices 
in (5.61b) and (5.65): 


1 80 347.98 

: 120i) $535 169.47 

$-xb-|. | _ E rai: (572) 
1 70 312.28 


The fitted values are the same, of course, as in Table 1.2. 


Hat Matrix. We can express the matrix result for Y in (5.71) as follows by using the 
expression for b in (5.60): 


Ў —X(XX)'X'Y 


or, equivalently: 


Y =H Y (5.73) 
nx nxn nxl 
where: 

H —XQX)'X (5.732) 


nxn 


We see from (5.73) that the fitted values Ў, can be expressed as linear combinations of 
the response variable observations Y;, with the coefficients being elements of the matrix 
H. The H matrix involves only the observations on the predictor variable X, as is evident 
from (5.733). 

The square n x n matrix His called the hat matrix. It plays an important role in diagnostics 
for regression analysis, as we shall see in Chapter 10 when we consider whether regression 


Residuals 


Example 


Chapter 5 Matrix Approach to Simple Linear Regression Analysis 203 


results are unduly influenced by one or a few observations. The matrix H is symmetric and 
has the special property (called idempotency): 


HH = H (5.74) 


In general, a matrix M is said to be idempotent if MM = M. 


Let the vector of the residuals e; = Y; — Y; be denoted by e: 


е 
е2 
еж (5.75) 
Cn i 
In matrix notation, we then have: 
e-y-Y-y-Xb (5.76) 
nxl nx] nx] пх1 nx] 


Forthe Toluca Company example, we obtain the vector of the residuals by using the results 
in (5.612) and (5.72): 


399 347.98 51.02 
121 169.47 —48.47 

е=|.|-| . |= ; (5.77) 
323 312.28 10.72 


The residuals are the same as in Table 1.2. 


Variance-Covariance Matrix of Residuals. The residuals е;, like the fitted values Y;, 
can be expressed as linear combinations of the response variable observations Y;, using the 
result in (5.73) for Y: 


e=Y-Y=Y-HY=(—My 
We thus have the important result: 


е =н: (5.78) 


пхі пхп пхп пх] 


where Н is the hat matrix defined in (5.53а). The matrix I — Н, like the matrix Н, is 
symmetric and idempotent. i 
The variance-covariance matrix of the vector of residuals e involves the matrix I — H: 


o{e} = 02 — Н) (5.79) 


nxn 2 


and is estimated by: . 


s’{e} = MSE(1 — Н) (5.80) 


nxn 


204 PartOne Simple Linear Regression 


Comment 
The variance-covariance matrix of e in (5.79) can be derived by means of (5.46). Since e — (Y — H)Y 
we obtain: і 


с?{е) = (1 — Н)о {Y} (i — Н) 


Now 02{Y}=07{€}=o7I for the normal error model according to (5.56a). Also, (I~ Hy = 
Y — H because of the symmetry of the matrix. Hence: 


02(е) = c?(1 — MIA — Н) 
= c?(1— H) — Н) 


In view of the fact that the matrix I—H is idempotent, we know that £k — H) — Н) = 
Y — Н and we obtain formula (5.79). : в 


5.12 Analysis of Variance Results 


* 


Sums of Squares , Р 
To see how the sums of squares are expressed in matrix notation, we begin with the total sum 
of squares SSTO, defined in (2.43). It will be convenient to use an algebraically equivalent 
expression: 


E p» 
SSTO — y-Yy?- p QN 
Yom - = УИ : (5.81) 
We know from (5.13) that: 
YYs y у? 
The subtraction term (X` Y;)?/n in matrix form uses J, the matrix of 1s defined in (5.18), 


as follows: 
Y)? 1 
бн (т) ау — 


n 


For instance, if n = 2, we have: 


1 1 11 [У (Yi + Y3)(Y; + Y2) 
Galea a 


Hence, it follows that: 
1 
SSTO = ҮҮ — G) Y'JY (5.83) 
n 


Just as Y` Y? is represented by Y'Y in matrix terms, so SSE = Y, е? = Y (Y; — Y;)? can 
be represented as follows: 
SSE = e'e = (XY — Xby (Y — Xb) (5.84) 
which can be shown to equal: 


SSE = ҮҮ —WX'Y (5.842) 


Chapter 5 Matrix Approach to Simple Linear Regression Analysis 205 


Finally, it can be shown that: 


1 
SSR = 'X'Y — (=| Y'JY (5.85) 
n 
Example _ Let us find SSE for the Toluca Company example by matrix methods, using (5.842). Using 
Exel" ——  (5.61a), we obtain: 
399 
121 
ҮҮ=[399 121 --- 323]| . | = 2,745,173 
323 
and using (5.65) and (5.63), we find: 
RS 7,800] — 
b’X’Y = [62.37 3.5702] ns — 2,690,348 
Hence: 
Ж SSE = ҮҮ — b'X'Y = 2,745,173 — 2,690,348 = 54,825 


which is the same result as that obtained in Chapter 1. Any difference would have been due 
to rounding effects, 


Comment 
To illustrate the derivation of the sums of squares expressions in matrix notation, consider SSE: 
SSE = ее = (Y — Xb)'(Y — Xb) = ҮҮ —2b'X'Y + b’X’Xb 
In substituting for the rightmost b we obtain by (5.60): 
SSE = ҮҮ — 2b’X’Y -b'X'X(X'X) XY 
= ҮҮ —2b'X'Y + bIX'Y 


In dropping I and subtracting, we obtain the result in (5.84a). ш 


Sums of Squares as Quadratic Forms _ 
The ANOVA sums of squares can be shown to be quadratic forms. Anexample of a quadratic 
form of the observations Y; when п = 2 is: 


5Y; + 6Y Y; + 4Y7 (5.86) 


Note that this expression is a second-degree polynomial containing terms involving the 
squares of the observations and the cross product. We can express (5.86) in matrix terms as 
follows: 


. ^ 


74 zrl; A А — Y'AY (5.862) 


where À is a symmetric matrix. 


206 PartOne Simple Linear Regression 


In general, a quadratic form is defined as: 


i cons em where a;j = aji (5.87) 
= j= 
A is a symmetric n x n matrix and is called the matrix of the quadratic form. 

The ANOVA sums of squares SSTO, SSE, and SSR are all quadratic forms, as can be 


seen by reexpressing b'X', From (5.71), we know, using (5.32), that: 
bx = (Xb)’ = Y 


We now use the result in (5.73) to obtain: 


bX’ = (HY) " 
so 
Since H is a symmetric matrix so that H’ = H, we finally obtain, using (5.32): 
b'X' = ҮН (5.88) 
This result enables us to express the ANOVA sums of squares as follows: 
SSTO = Y' [ =; Ө j Y (5.89a) 
n 
SSE = Y'A — H)Y (5.89b) 
1 
SSR = Y' |н = Ө j Ү (5.89с) 
Each of these sums of squares can now be seen to be of the form Y'AY, where the three A 
matrices are: 
1 
I- Ө Ј (5.90а) 
п 
I-H (5.90b) 
1 
Н – 5) Ј (5.90с) 


Since each of these A matrices is symmetric, SSTO, SSE, and SSR are quadratic forms, 
with the matrices of the quadratic forms given in (5.90). Quadratic forms play an important 
role in statistics because all sums of squares in the analysis of variance for linear statistical 
models can be expressed as quadratic forms. 


5.13  Inferences in Regression Analysis 


As we Saw in earlier chapters, allinterval estimates are of the following form: point estimator 
plus and minus a certain number of estimated standard deviations of the point estimator. 
Similarly, all tests require the point estimator and the estimated standard deviation of the 
point estimator or, in the case of analysis of variance tests, various sums of squares. Matrix 
algebra is of principal help in inference making when obtaining the estimated standard 
deviations and sums of squares. We have already given the matrix equivalents of the sums 
of squares for the analysis of variance. We focus here chiefly on the matrix expressions for 
the estimated variances of point estimators of interest. 


Chapter 5 Matrix Approach to Simple Linear Regression Analysis 207 


Regression Coefficients 
The variance-covariance matrix of b: 


an [96] olho bi} 
а |n b) olb) | dn 
is: 
o?(b] = o?(X'X)'! (5.92) 
2x2 
or, from (5.242): 
о? T о?Х? —Хо? 
_рп EA- E(X – Х)2 

ы Ии С era 


УХХ; – Xy УХХ; - Xy 


When MSE is substituted for o? in (5.92a), we obtain the estimated variance-covariance 
matrix of b, denoted by s? (b): 


MSE X?MSE —XMSE 
X; — Xy: X;— Xy 
spj = меку =| ” > y XX "| 6.93) 
2x2 —XMSE MSE 


УХХ, - Xy LA- Xy 
In (5.922), you will recognize the variances of Ро іп (2.22b) and of b, in (2.3b) and the 
covariance of bo and b, in (4.5). Likewise, the estimated variances in (5.93) are familiar 
from earlier chapters. 


We wish to find s?(bo] and s?{b,} for the Toluca Company example by matrix methods. 
Using the results in Figure 2.2 and in (5.64), we obtain: 


.287475 | —.003535 
—.003535 .00005051 


Example 


s^(b) = MSE(X'X)! = 2,384 | 


[685.344 —8.428 
= | —8.428 12040 


Thus, s?(bg] = 685.34 and s? {bı} = .12040. These are the same as the results obtained in 
Chapter 2. 


(5.94) 


Comment 
To derive the variance-covariance matrix of b, recall that: 
b = (X'X'X'Y = AY 
where А is a constant matrix: : 
А = (XXX 
Hence, by (5.46) we have: 
0” {b} = Ao?[Y]A' 


208 PartOne Simple Linear Regression 


Now o?[Y] = oI. Further, it follows from (5.32) and the fact that (X'X) ! is symmetric that: 
A’ = X(X'X)" 
We find therefore: 
0? {b} = (XX) !X'o^IX(X'X) ! 
= 0?(X’X)X’K(X'X)* 


Mean Response 


Example 


= 07(X’X) I 
= PXK 
"s н 
To estimate the mean response at Xp, let us define the vector: 
Хх, = | | | or Xin XI (5.95) 
2х1 Xh „ 1x2 = 
The fitted value in matrix notation then is: 
Y, = Xib (5.96) 


since: 
X b=[1 Xa] М = [bo + b Xi] = 101 = În 
Note that X, b is a 1 x 1 matrix; hence, We can write the final result as a scalar. 
The variance of Үр, given earlier in (2.29b), in matrix notation is: 
o{¥,} = o?X,(X'X) ІХ, (5.97) 


The variance of Ў, in (5.93) can be expressed as a function of o? {b}, the variance-covariance 
matrix of the estimated regression coefficients, by making use of the result in (5.92): 


c?(f,) = X,o?(b]X, (5.972) 
The estimated variance of Ў, given earlier in (2.30), in matrix notation is: 
s?($,) = МЅЕ(Х, (XX)! X,) (5.98) 
We wish to find s?{¥;,} for the Toluca Company example when X, = 65. We define: 
X, =[1 65] 
and use the result in (5.94) to obtain: 
s’ {În} = X,s^(b)X, 


685.34 —8.428 Jas 


= em 12040 MEEZ 


This is the same result as that obtained in Chapter 2. 


Chapter 5 Matrix Approach to Simple Linear Regression Analysis 209 


Comment 
The result in (5.972) can be derived directly by using (5.46), since Ў, = Xj b: 


оў} = X,o^tb)X, 


Непсе: 


ч=п [Ды ды | [| 
ог: 
c?(f,) = оь} + 2X10 {bo, bi) + X20? (bi) (5.99) 
Using the results from (5.922), we obtain: 
of?) = c? + о? X? m 2X n(—X)o* " Xo? Й 
п YAX VO XP DA- XY? 


which reduces to the familiar expression: 


І ку (5.99а) 


аруа ) 

с} = с БЕ" 

Thus, we see explicitly that the variance expression in (5.99а) contains contributions from c? (bo), 

6? (bi), and o [bo, bı}, which it must according to (A.30b) since Ў, = bo +b X, isalinearcombination 
of bo and by. u 


Prediction of New Observation 


Cited 


Reference 


Problems 


The estimated variance s?{pred}, given earlier in (2.38), in matrix notation is: 


s?(predj = MSE(1 + X, (XX)! X,) (5.100) 


5.1. Graybill, Е A. Matrices with Applications in Statistics. 2nd ed. Belmont, Calif.: Wadsworth, 
2002. 


5.1. For the matrices below, obtain (1) A + B, (2) A — B, (3) AC, (4) AB’, (5) B'A. 


14 1 3 
А= |2 6 .В= |1 4 c-[; И | 
3 8 2 5 


State the dimension of each resulting matrix. 
5.2. For the matrices below, obtain (1) A + C, (2) A — С, (3) B'A, (4) AC, (5) C'A. 


[2 1 6 3 8 
35 9 8 6 
AT s 7 Belg ex 
4 8 1 “|2 4 


State the dimension of each resulting matrix. 


5.3. Show how the following expressions are written in terms of matrices: (1) Y; — Ё = ер, 
Q) Уу Хе; = 0. Assume i = 1,..., 4. 


210 PartOne Simple Linear Regression 


*5.4. 


5.5. 


*5.6. 


5.7. 
5.8. 


5.9. 


5.10. 


Flavor deterioration. The results shown below were obtained in a small-scale experiment to 
study the relation between °F of storage temperature (X) and number of weeks before flavor 
deterioration of a food product begins to occur (У). 


i: 1 2 3 4 5 
Xi: 8 4 0 —4 -8 
Yi: 7.8 9.0 10.2 11.0 11.7 


Assume that first-order regression model (2.1) is applicable. Using matrix methods, find (1) 
Y'Y, (2) ХХ, (3) ХҮ. 

Consumer finance. The data below show, for a consumer finance company operating in six 
cities, the number of competing loan companies operating in the city (X) and the number per 
thousand of the company's loans made in that city that are currently delinquent (Y): 


i 1 2 3 4 5 6 


rÜ 4 
(E 16 5 10 15 13 22 


zx 
A 
— 
N 
w 
w 


Assume that first-order regression model (2.1) is applicable. Using matrix methods, find (1) 
Y'Y, (2) X'X, (3) X'Y. р 

Refer to Airfreight breakage Problem 1.21. Using matrix methods, find (1) Y'Y, (2) X'X, 
(3) X'Y. 

Refer to Plastic hardness Problem 1.22. Using matrix methods, find (1) Y'Y, (2) X'X, (3) X'Y. 


Let B be defined as follows: 
1 50 
B=]1 0 5 
105 


a. Are the column vectors of B linearly dependent? 
b. What is the rank of B? 
c. What must be the determinant of В? 


Let A be defined as follows: 
018 
А= |0 3 1 
05 5 


a. Are the column vectors of A linearly dependent? 

b. Restate definition (5.20) in terms of row vectors. Arethe row vectors of A linearly dependent? 
c. What is the rank of A? 

d. Calculate the determinant of A. 


Find the inverse of each of the following matrices: 


4-320 ^2 
Eg sss 7 
10 1 6 


Check in each case that the resulting matrix is indeed the inverse. 


napus о жашил pproucn tu уште Linear Kegresston ANALYSIS £11 


5.11. Find the inverse of the following matrix: 


5.1 3 
А= |4 0 5 
196 


Check that the resulting matrix 15 indeed the inverse. 
*5.12. Refer to Flavor deterioration Problem 5.4. Find (Х'Х)-!. 
5.13. Refer to Consumer finance Problem 5.5. Find (XX) !. 
*5.14. Consider the simultaneous equations: 


4y + Ty2 = 25 
2у| + 3y2 = 12 

a. Write these equations in matrix notation. 

b. Using matrix methods, find the solutions for y; and у. 

5.15. Consider the simultaneous equations: 
Sy, + 2y2 =8 L 

23y; + 7y; = 28 

a. Wnte these equations in matrix notation. 


b. Using matrix methods, find the solutions for уу and у». 


5.16. Consider the estimated linear regression function in the form of (1.15). Write expressions in 
this form for the fitted values Y; in matrix terms fori = 1,...,5. 


5.17. Consider the following functions of the random variables Y;, Y2, and Үз: 

W =Y +Y + Үз 
W = Y% – № 
W3 = Ү, — Y. 2 Y: 3 

a. State the above in matrix notation. 

b. Find the expectation of the random vector W. 

c. Find the variance-covariance matrix of W. 

*5.18. Consider the following functions of the random variables Y;, Y2, Үз, and Үд: 


1 
= 40 + Yo + Ys + Y4) 


1 1 
W = _(Ү, + 0) — = (1з + Ya) 
2 2 
a. State the above in matrix notation. 
b. Find the expectation of the random vector W. 
c. Find the variance-covariance matrix of W. 
*5.19. Find the matrix A of the quadratic form: 
3Y? + 10Y1 Y; + 17Ү? 
5.20. Find the matrix A of the quadratic form: 


‚ЛҮ? — 8Y,Y; +8Ү; 


212 PartOne Simple Linear Regression 


*5.21. 


5.22. 


35.23. 


5.24. 


*5.25. 


5.26. 


For the matrix: 


E 


find the quadratic form of the observations Y; and Үз. 


For the matrix: 
104 
А = | 3 | 
4 09 


find the quadratic form of the observations Y,, У, and Уз. 

Refer to Flavor deterioration Problems 5.4 and 5.12. 

a. Using matrix methods, obtain the following: (1) vector of estimated regression coefficients, 
(2) vector of residuals, (3) SSR, (4) SSE, (5) estimated variance-covariance matrix of b, 
(6) point estimate of E{Y;,} when X, = —6, (7) estimated variance of Ê, when X, = —6. 

b. What simplifications arose from the spacing of the X levels in the experiment? 

c. Find the hat matrix H. * 

d. Find s?{e}. 

Refer to Consumer finance Problems 5.5 and 5.13. 

a. Using matrix methods, obtain the following: (1) vector of estimated regression coefficients, 
(2) vector of residuals, (3) SSR, (4) SSE, (5) estimated variance-covariance matrix of b, 
(6) point estimate of E(Y,) when X, = 4, (7) s?(pred) when X; = 4. 

b. From your estimated variance-covariance matrix in part (a5), obtain the following: 
(1) s{bo, bi); (2) 52060); (3) 501). 

c. Find the hat matrix H. 

d. Find s? (е). 

Refer to Airfreight breakage Problems 1.21 and 5.6. 

a. Using matrix methods, obtain the following: (1) (X’X)~!, (2) b, (3) e, (4) Н, (5) SSE, 
(6) s? [b], (7) Р, when X, = 2, (8) s?{¥n} when X, = 2. 

b. From part (a6), obtain the following: (1) s*{by}; (2) s(bo, bi}; (3) s(bo]. 

c. Find the matrix of the quadratic form for SSR. 

Refer to Plastic hardness Problems 1.22 and 5.7. 

a. Using matrix methods, obtain the following: (1) (X'X) ^, (2) b, (3) Ў, (4) Н, (5) SSE, 
(6) s?{b}, (7) s?{pred} when X; = 30. 

b. From part (a6), obtain the following: (1) s? (bo); (2) s(bo, bi}; (3) s(bi]. 

c. Obtain the matrix of the quadratic form for SSE. 


Exercises 


5.27. 


5.28. 


5.29. 


5.30. 
5.31. 


Refer to regression-through-the-origin model (4.10). Set up the expectation vector fore. Assume 
that? = 1,...,4. 

Consider model (4.10) for regression through the origin and the estimator Бу given in (4.14). 
Obtain (4.14) by utilizing (5.60) with X suitably defined. 

Consider the least squares estimator b given in (5.60). Using matrix methods, show that b is an 
unbiased estimator. 

Show that Ӯ, in (5.96) can be expressed in matrix terms as РХ. 

Obtain an expression for the variance-covariance matrix of the fitted values $.i-l.anm 
in terms of the hat matrix. 


Part 


Chapter iy^ 


6.1 Multiple Regression Models 


| К 
a 
74 


Multiple Regression I 


Multiple regression analysis is one of the most widely used of all statistical methods. In 
this chapter, we first discuss a variety of multiple regression models. Then we present the 
basic statistical results for multiple regression in matrix form. Since the matrix expressions 
for multiple regression are the same as for simple linear regression, we state the results 
without much discussion. We conclude the chapter with an example, illustrating a variety 
of inferences and residual analyses in multiple regression analysis. 


Need for Several Predictor Variables 


214 


When we first introduced regression analysis in Chapter 1, we spoke of regression models 
containing a number of predictor variables. We mentioned a regression model where the 
response variable was direct operating cost for a branch office of a consumer finance chain, 
and four predictor variables were considered, including average number of loans outstanding 
at the branch and total number of new loan applications processed by the branch. We also 
mentioned a tractor purchase study where the response variable was volume of tractor 
purchases in a sales territory, and the nine predictor variables included number of farms in 
the territory and quantity of crop production in the territory. In addition, we mentioned a 
study of short children where the response variable was the peak plasma growth hormone 
level, and the 14 predictor variables included gender, age, and various body measurements. 
In all these examples, a single predictor variable in the model would have provided an 
inadequate description since a number of key variables affect the response variable in 
important and distinctive ways. Furthermore, in situations of this type, we frequently find 
that predictions of the response variable based on a model containing only a single predictor 
variable are too imprecise to be useful. We noted the imprecise predictions with a single 
predictor variable in the Toluca Company example in Chapter 2. A more complex model, 
containing additional predictor variables, typically is more helpful in providing sufficiently 
precise predictions of the response variable. 

In each of the examples just mentioned, the analysis was based on observational data be- 
cause the predictor variables were not controlled, usually because they were not susceptible 
to direct control. Multiple regression analysis is also highly useful in experimental situations 
where the experimenter can control the predictor variables. An experimenter typically will 
wish to investigate a number of predictor variables simultaneously because almost always 


Chapter 6 Multiple Regression] 215 


more than one key predictor variable influences the response. For example, in a study of 
productivity of work crews, the experimenter may wish to control both the size of the crew 
and the level of bonus pay. Similarly, in a study of responsiveness to a drug, the experimenter 
may wish to control both the dose of the drug and the method of administration. 

The multiple regression models which we now describe can be utilized for either obser- 
vational data or for experimental data from a completely randomized design. 


First-Order Model with Two Predictor Variables 


FIGURE 6.1 
Response 
Function is a 
Plane—Sales 
Promotion 
Example. 


When there are two predictor variables X, and Хэ, the regression model: 
Yi = Bo + ВуХ + ВХ + & (6.1) 


is called a first-order model with two predictor variables. A first-order model, as we noted 

in Chapter 1, is linear in the predictor variables. Y; denotes as usual the response in the 

ith trial, and Хуу and Xj are the values of the two predictor variables in the ith trial. The 

parameters of the model are Во, B,, and f2, and the error term is &;. t 
Assuming that E(s;] = 0, the regression function for model (6.1) is: 


E(Y] = Bo + BiX1 + ВХ (6.2) 


Analogous to simple linear regression, where the regression function E{Y} = Bp + ВХ is 
a line, regression function (6.2) is a plane. Figure 6.1 contains a representation of a portion 
of the response plane: 


E(Y] = 10 + 2X; + 5X2 (6.3) 


Note that any point on the response plane (6.3) corresponds to the mean response E{Y} at 
the given combination of levels of X and X2. 

Figure 6.1 also shows an observation Y; corresponding to the levels (Xi1, Xi2) of the two 
predictor variables. Note that the vertical rule in Figure 6.1 between Y; and the response plane 
represents the difference between Y; and the mean E (Y;] of the probability distribution of 
Y for the given (X;,, X;?) combination. Hence, the vertical distance from Y; to the response 
plane represents the error term e; = Y; — E(Y;]. 


& ҚУ} 2 10 + 2X, + 5X2 


Ф (Хп, Xi) 
X 


А 


X2 


216 PartTwo Multiple Linear Regression 


Example 


` 


Frequently the regression function in multiple regression is called a regression surface 
or a response surface. In Figure 6.1, the response surface is a plane, but in other cases the 
response surface may be more complex in nature. 


Meaning of Regression Coefficients. Let us now consider the meaning of the regression 
coefficients in the multiple regression function (6.3). The parameter fo — 10 is the Y in- 
tercept of the regression plane. If the scope of the model includes X, = 0, X; = 0, then 
Во = 10 represents the mean response E{Y} at X, = 0, X2 = 0. Otherwise, Во does not 
have any particular meaning as a separate term in the regression model. 

The parameter В, indicates the change in the mean response E (Y] per unit increase in 
X, when X; is held constant. Likewise, fz indicates the change in the mean response per 
unit increase in Хә when Х is held constant. To see this for our example, suppose X» is 
held at the level Хә = 2. The regression function (6.3) now is: 


E{Y} = 10+ 2X; +5(2) = 20+ 2x, х= 2 (6.4) 
Note that this response function is a straight line with slope В, = 2. The same is true for 
any other value of X5; only the intercept of the response function will differ. Hence, f, — 2 
indicates that the mean response E (Y J increases by 2 with a unit increase in X; when X» is 
constant, no matter what the level of X2. We confirm therefore that В, indicates the change 
in E(Y] with a unit increase in X, when Х is held constant. 

Similarly, f; = 5 in regression function (6.3) indicates that the mean response EY] 
increases by 5 with а unit increase in X? when X, is held constant. 

When the effect of X, on the mean response does not depend on the level of X», and 
correspondingly the effect of Хә does not depend on the level of Х|, the two predictor 
variables are said to have additive effects or not to interact. 'Thus, the first-order regression 
model (6.1) is designed for predictor variables whose effects on the mean response are 
additive or do not interact. 

The parameters В, and fz are sometimes called partial regression coefficients because 
they reflect the partial effect of one predictor variable when the other predictor variable is 
included in the model and is held constant. 


The response plane (6.3) shown in Figure 6.1 is for a regression model relating test market 
sales (Y, in 10 thousand dollars) to point-of-sale expenditures (Х |, in thousand dollars) and 
TV expenditures (Хә, in thousand dollars). Since f =2, if point-of-sale expenditures in 
a locality are increased by one unit (1 thousand dollars) while TV expenditures are held 
constant, expected sales increase by 2 units (20 thousand dollars). Similarly, since f; = 5, 
if TV expenditures in a locality are increased by 1 thousand dollars and point-of-sale 
expenditures are held constant, expected sales increase by 50 thousand dollars. 


Comments 

1. A regression model for which the response surface is a plane can be used either in its own right 
when it is appropriate, or as an approximation to a more complex response surface. Many complex 
response surfaces can be approximated well by a plane for limited ranges of X, and X;. 


Chapter 6 Multiple Regression I 217 


2. We can readily establish the meaning of В; and В, by calculus, taking partial derivatives of the 
response surface (6.2) with respect to X, and X; in turn: 


ӘЕ{Ү} дЕ{Ү} 
ax, ^ ax, ^ 
The partial derivatives measure the rate of change in E{Y} with respect to one predictor variable when 
the other is held constant. ш 


First-Order Model with More than Two Predictor Variables 


We consider now the case where there are p — 1 predictor variables X;,..., Xp. 1. The 
regression model: 


Yi = Bo + BiXin + Хо +-+- + BgaXipi + &{ (6.5) 
is called a first-order model with p — 1 predictor variables. It can also be written: 


p-1 
Y; = Bot У BX + & (6.5a) 


k=l 


or, if we let Х;о = 1, it can be written as: 
p= 
Y; E Y AX + Ei where Xio zl (6.5b) 
k=0 


Assuming that E (c;] = 0, the response function for regression model (6.5) is: 
E(Y] = Bo + В.Х. + BoXo +--+ + Bp-iXp-1 (6.6) 


This response function is a hyperplane, which is a plane in more than two dimensions. It 
is no longer possible to picture this response surface, as we were able to do in Figure 6.1 
for the case of two predictor variables. Nevertheless, the meaning of the parameters is 
analogous to the case of two predictor variables. The parameter f indicates the change in 
the mean response E(Y] with a unit increase in the predictor variable Хк, when all other 
predictor variables in the regression model are held constant. Note again that the effect 
of any predictor variable on the mean response is the same for regression model (6.5) no 
matter what are the levels at which the other predictor variables are held. Hence, first- 
order regression model (6.5) is designed for predictor variables whose effects on the mean 
response are additive and therefore do not interact. 


Comment 
When p — 1 — 1, regression model (6.5) reduces to: 


Y, = Po + Pi Xi + & 


which is the simple linear regression model considered in earlier chapters. ш 


2 


General Linear Regression Model 


In general, the variables Xj,..., X ка in а regression model do not need to represent 
different predictor variables, as we shall shortly see. We therefore define the general linear 


18 PartTwo Multiple Linear Regression 


regression model, with normal error terms, simply in terms of X variables: 


Y; = Po + Bi Xi faXio +---+ Êp-1Xi,p-1 + & 


where: 
Во, Bi, ---, Bp-1 are parameters 
Xii, .--, Xi,p-1 are known constants 


£; are independent N (0, o?) 


i—1l,....n 


If we let Х;о = 1, regression model (6.7) can be written as follows: 


Ү = BoXio + В. Ха + ВХо +--- + Bp-1 Xi,p-1 + Ei 


where Xj = 1, or: 
p- 
Ү = SS) PX + £i where Xjo = 1 
k=0 ud 


The response function for regression model.(6.7) is, since E(c;] = 0: 


E(Y] = Bo + В.Х, + ВХ +--+ + Bp-1Xp-1 


(6.7) 


(6.7a) 


(6.7b) 


(6.8) 


Thus, the general linear regression model with normal error terms implies that the obser- 
vations Y; are independent normal variables, with mean E{Y;} as given by (6.8) and with 


constant variance o?. 


This general linear model encompasses a vast variety of situations. We consider a few 


of these now. 


p — 1 Predictor Variables. When X,,..., Xy. represent p — 1 different predictor vari- 
ables, general linear regression model (6.7) is, as we have seen, a first-order model in which 
there are no interaction effects between the predictor variables. The example in Figure 6.1 


involves a first-order model with two predictor variables. 


Qualitative Predictor Variables. The general linearregression model (6.7) encompasses 
not only quantitative predictor variables but also qualitative ones, such as gender (male, 
female) or disability status (not disabled, partially disabled, fully disabled). We use indicator 
variables that take on the values 0 and 1 to identify the classes of a qualitative variable. 

я Consider a regression analysis to predict the length of hospital stay (Y) based on the age 


(X1) and gender (X2) of the patient. We define Хә as follows: 


X = 1 if patient female 
з 0 if patient male 


The first-order regression model then is as follows: 
Y; = Bo + BiXi + PoXi2 + е; 
where: 
Х = patient’s age 


Xo l ifpatient female 
2 0 if patient male 


(6.9) 


Chapter 6 Multiple Regression] 219 


The response function for regression model (6.9) is: 


E{Y} = Bo + В.Х, + ВХ (6.10) 
For male patients, X? = 0 and response function (6.10) becomes: 
E(Y] = Bo + В.Х} Male patients (6.10a) 
For female patients, Х = 1 and response function (6.10) becomes: 
E{Y} = (Bp + P2) + В.Х, Female patients (6.10b) 


These two response functions represent parallel straight lines with different intercepts. 

In general, we represent a qualitative variable with c classes by means of c — 1 indicator 
variables. For instance, if in the hospital stay example the qualitative variable disability 
status is to be added as another predictor variable, it can be représented as follows by the 
two indicator variables X4 and X4: М 


х, = 1 if patient not disabled 
3 [0 otherwise 


"A 1 if patient partially disabled 
^ (0 otherwise 


The first-order model with age, gender, and disability status as predictor variables then is: 
Y; = Bo + Pi Xii + ВХ + PsXis + BaXia + €i (6.11) 


where: 


Xi = patient's age 

X = 1 if patient female 
2 — |0 if patient male 

ха 1 if patient not disabled 
8 19 otherwise 

х= 1 ifpatient partially disabled 
i4 7190 otherwise 


In Chapter 8 we present a comprehensive discussion of how to model qualitative predictor 
variables and how to interpret regression models containing qualitative predictor variables. 


Polynomial Regression. Polynomial regression models are special cases of the general 
linear regression model. They contain squared and higher-order terms of the predictor vari- 
able(s), making the response function curvilinear. The following is a polynomial regression 
model with one predictor variable: 


-Y; = Bo + В.Х; + ВХ? + є; (6.12) 

Figure 1.3 on page 5 shows an example of a polynomial regression function with one 
predictor variable. И 

Despite the curvilinear nature of the response function for regression model (6.12), it is 


a special case of general linear regression model (6.7). If we let X; = X; and Xj2 = X2, 
we can write (6.12) as follows: 


Y; = Во + В.Ха + Xin + & 


220 PartTwo Multiple Linear Regression 


which is in the form of general linear regression model (6.7). While (6.12) illustrates a curvi- 
linearregression model where the response function is quadratic, models with higher-degree 
polynomial response functions are also particular cases of the general linear regression 
model. We shall discuss polynomial regression models in more detail in Chapter 8. 


Transformed Variables. Models with transformed variables involve complex, curvilinear 
response functions, yet still are special cases of the general linear regression model. Consider 
the following model with a transformed Y variable: 


log Y; = Bo + Bi Xin + BoXi2 + B3Xi3 + е; (6.13) 


Here, the response surface is complex, yet model (6.13) can still be treated as a general 
linear regression model. If we let Y; = log У, we can write regressign#inodel (6.13) as 
follows: 


Y; = Bo + В.Хи + ВХ + B3X ia + 8 


which is in the form of general linear regression model (6.7). The response variable just 
happens to be the logarithm of Y. 

Many models can be transformed into the general linear regression model. For instance, 
the model: 
= 1 

Bo + BiXin + ВХ + ё: 


can be transformed to the general linear regression model by letting Y; = 1/Y;. We then 
have: 


Y; 


(6.14) 


Y; = Po + В. Ха + ВХ + £i 


Interaction Effects. When the effects of the predictor variables on the response variable 
are not additive, the effect of one predictor variable depends on the levels of the other pre- 
dictor variables. The general linear regression model (6.7) encompasses regression models 
with nonadditive or interacting effects. An example of a nonadditive regression model with 
two predictor variables X, and X; is the following: 


Y; = Bo + В.Ха + B2Xio + ВзХ Хо + ё: (6.15) 


Here, the response function is complex because of the interaction term 83 Х; Х;2. Yet 
regression model (6.15) is a special case of the general linear regression model. Let X;3 = 
X; Xj? and then write (6.15) as follows: 


Y; = Bo + fiXa + ВХ + Bs Xia + 6j 


We see that this model is in the form of general linear regression model (6.7). We shall 
discuss regression models with interaction effects in more detail in Chapter 8. 


Combination of Cases. A regression model may combine several of the elements we have 
just noted and still be treated as a general linear regression model. Consider the following 
regression model containing linear and quadratic terms for each of two predictor variables 
and an interaction term represented by the cross-product term: 


Y; = Bo + Ха + ВХ + BaXi2 + ЬХ, + BsXn Хо + & (6.16) 


Chapter 6 Multiple Regression] 221 


FIGURE 6.2 Additional Examples of Response Functions. 


(a) (b) 


Let us define: 
Zi = Ха Zi = X? Zi = Xi Zia = Xh Zis = Xin Хо 
We can then write regression model (6.16) as follows: 
Y; = Po + Pi Za + BoZi2 + P3Zi3 + BaZia + BsZis + & 
which is in the form of general linear regression model (6.7). 
The general linear regression model (6.7) includes many complex models, some of which 


may be highly complex. Figure 6.2 illustrates two complex response surfaces when there 
are two predictor variables, that can be represented by general linear regression model (6.7). 


Meaning of Linear in General Linear Regression Model. її should be clear from the 
various examples that general linearregression model (6.7) is not restricted to linear response 
surfaces. The term linear model refers to the fact that model (6.7) is linear in the parameters; 
it does-not refer to the shape of the response surface. 

We say that a regression model is linear in the parameters when it can be written in the 
form: 


Y; = cioBo + Ci Bi + со» + +++ + Cip—-1Bp-i + ё (6.17) 


where the terms cio, сд, etc., are coefficients involving the predictor variables. For example, 
first-order model (6.1) in two predictor variables: 


Y; = Bo + В.Ха + ВХр + & 


is linear in the parameters, with с;о = 1, сд = Хп, and со = Хр. 
An example of a nonlinear regression model is the following: 


Y; = Bo exp(B, Xi) + & 


This is a nonlinear regression model because it cannot be expressed in the form of (6.17). 
We shall discuss nonlinear regression models in Part Ш. 


222 Part Two Multiple Linear Regression 


6.2 General Linear Regression Model in Matrix Terms 


We now present the principal results for the general linear regression model (6.7) in matrix 
terms. This model, as noted, encompasses a wide variety of particular cases. The results to 
be presented are applicable to all of these. 

It is a remarkable property of matrix algebra that the results for the general linear regres- 
sion model (6.7) in matrix notation appear exactly as those for the simple linear regression 
model (5.57). Only the degrees of freedom and other constants related to the number of X 
variables and the dimensions of some matrices are different. Hence, we are able to present 
the results very concisely. 

The matrix notation, to be sure, may hide enormous computational complexities. To find 
the inverse of a 10 x 10 matrix А requires a tremendous amount of éSinputation, yet it is 
simply represented as A~'. Our reason for emphasizing matrix algebra is that it indicates 
the essential conceptual steps in the solution. The actual computations will, in all but the 
very simplest cases, be done by computer. Hence, it does not matter to us whether (X'X)^! 
represents finding the inverse of a2 x 2 ora 10 x 10 matrix. The important point is to know 
what the inverse of the matrix represents. , 

To express general linear regression model (6.7): 


Y; = В+ BiXi + Хо +++ + Bp-1Xi,p-1 + 8i 


in matrix terms, we need to define the following matrices: 


(6.18a) (6.18b) 
Y, 1 Xu Xp +e Xia 
Y, 1 Xa Xo © Хро 
Y=]. X=]. . . й 
пх1 пхр н 
Y, 1 Xn Хә Xn, p-1 
(6.18) 
(6.18c) (6.18d) 
Bo £l 
pa 6 = | 
pxl : пхі 
By-1 En 


Note that the Y and є vectors are the same as for simple linear regression. The В vector 
contains additional regression parameters, and the X matrix contains a column of 1s as well 
as a column of the n observations for each of the p — 1 X variables in the regression model. 
The row subscript for each element Хук in the X matrix identifies the trial or case, and the 
column subscript identifies the X variable. 

In matrix terms, the general linear regression model (6.7) is: 


Y-X В +е (6.19) 


пхі пхр nxp пх1 


Chapter 6 Multiple Regression] 223 


where: 


Y is a vector of responses 

В is a vector of parameters 

X is a matrix of constants 

€ is a vector of independent normal random variables with expectation 


E{e} = 0 and variance-covariance matrix: 


о? 0 0 
0 о? 0 Е 
04е) = |. . ‚| 97, 
0 0 --. о? L 
Consequently, the random vector Y has expectation: 
E(Y) = Xf (6.20) 


пх1 


and the variance-covariance matrix of Y is the same as that of e: 


o?(Y) = c?I (6.21) 


nxn 


6.8 Estimation of Regression Coefficients 


The least squares criterion (1.8) is generalized as follows for general linear regression 
model (6.7): 


Q = Y — Bo ~ Ха — +++ — Bp-1Xi,p—1)” (6.22) 


i=l 


The least squares estimators are those values of Во, B1, - - - , Bp-1 that minimize Q. Let us 
denote the vector of the least squares estimated regression coefficients bo, Бу, . - - , bp-1 as b: 


bo 
bi 
b=]. (6.23) 
pxl 7 
- bp-1 


The least squares normal equations for the general linear regression model (6.19) are: 
- X'Xb — ХҮ А (6.24) 
and the least squares estimators аге: ' 


b = (XX) ` (XX) Y (6.25) 
2х1 2х1 


2х2 


224 PartTwo Multiple Linear Regression 


The method of maximum likelihood leads to the same estimators for normal error regres- 
sion model (6.19) as those obtained by the method of least squares in (6.25). The likelihood 
function in (1.26) generalizes directly for multiple regression as follows: 


1 1 п 
Lp 02) = вуй a| ОЕА ВАА в) (6.26) 
i=l 


Maximizing this likelihood function with respect to Во, Ві, - - › @р—1 leads to the estimators 
in (6.25). These estimators are least squares and maximum likelihood estimators and have 
all the properties mentioned in Chapter 1: they are minimum variance unbiased, consistent, 
and sufficient. 


6.& Fitted Values and Residuals 


Let the vector of the fitted values Ӯ; be denoted by Y and the vector of the residual terms 
= Y; — Y, be denoted by e: 


Ў, “ ё\ 
2 f ё 

(6272 =). (6275) =] | (6.27) 
f, €n 


The fitted values are represented by: 
Y = Xb (6.28) 


and the residual terms by: 


е -Y-Y-Y-Xb (6.29) 


пх1 
The vector of the fitted values Y can be expressed in terms of the hat matrix H as follows: 


Y =HY (6.30) 


пхі 


where: 


Н -X(XX)'X (6.302) 


nxn 


Similarly, the vector of residuals can be expressed as follows: 


e —(I— HB)Y (6.31) 


nxl 


The variance-covariance matrix of the residuals is: 


o^(e) = o?(1 — Н) (6.32) 


nxn 


Chapter 6 Multiple Regression] 225 


which is estimated by: 
s?(e) = MSEC — Н) (6.33) 


nxn 


6.5 Analysis of Variance Results 


sums of Squares and Mean Squares 


TABLE 6.1 
ANOVA Table 
for General 
Linear 
Regression 
Model (6.19). 


The sums of squares for the analysis of variance in matrix terms are, from (5.89): 
SSTO = ҮҮ — (=) ҮЈҮ =Y [ = G) J| Y " (6.34) 
SSE = e'e = (Y — Xb) (Y — Xb) = ҮҮ — ЪХҮ = Yd — Wy (6.35) 
SSR — b'X'Y — G) YJY=Y |н = G) j Y 66.36) 


where J is an n x n matrix of 1s defined in (5.18) апа Н is the hat matrix defined in (6.302). 
SSTO, as usual, has n — 1 degrees of freedom associated with it. SSE has n — p degrees 
of freedom associated with it since p parameters need to be estimated in the regression 
function for model (6.19). Finally, SSR has p — 1 degrees of freedom associated with it, 
representing the number of X variables X1,..., X, j. 
Table 6.1 shows these analysis of variance results, as well as the mean squares MSR and 
MSE: 


55 

Mu c. (6.37) 
p-1 

MSE = SE (6.38) 
n—p 


The expectation of MSE is o?, as for simple linear regression. The expectation of MSR 
is c? plus a quantity that is nonnegative. For instance, when p — 1 = 2, we have: 


1 m = 
E(MSR) = 0 + = [BP (Хи — X + BF 0o ~ XY 
+ 28182 Y (Ха — XX; — Lo] 
Note that if both В; and fz equal zero, E{MSR} = o?. Otherwise E{MSR} > o?. 


Source of i 
Variation SS. df MS 
| SSR 
Regression SSR = p'X'Y — Gp p—1 MSR = PT 
- Яя P SSE 
Error SSE = Y'Y — b‘X’Y n—p MSE = a 


Total SSTO=Y'Y— Ө Y JY п—1 


226 PartTwo Multiple Linear Regression 


F Test for Regression Relation 
To test whether there 15 a regression relation between the response variable Y and the set of 


X variables X1, ..., Xj 1, Le., to choose between the alternatives: 
Ho: = =... = 210 
0: В = В Pp- (6.39a) 
На: not all B, (k = 1,..., p — 1) equal zero 
we use the test statistic: 
MSR 
Е* = — 6. 
MSE (6-39b) 
The decision rule to control the Type 1 error at о is: 
P 
If F* < F(l —«;p-— l,n — р), conclude Ho "^ 
F Ё : (6.39c) 


If F* > F(1—o;p-—1,n-— p),conclude H, 


The existence of a regression relation by itself does not, of course, ensure that useful 
predictions can be made by using it. > 

Note that when p — | — 1, this test reduces to the F test in (2.60) for testing in simple 
linear regression whether or not В = 0. 


Coefficient of Multiple Determination 
The coefficient of multiple determination, denoted by R?, is defined as follows: 


SSR SSE 

2 

= —— =l- ———— 6. 
SSTO SSTO (6.40) 
It measures the proportionate reduction of total variation in Y associated with the use of the 
set of X variables Xi, ..., Xp-1- The coefficient of multiple determination R? reduces to the 


coefficient of simple determination in (2.72) for simple linear regression when p — 1 = 1, 
i.e., when one X variable is in regression model (6.19). Just as before, we have: 


0< 2 <1 (6.41) 


where R? assumes the value 0 when all bj = 0 (k = 1,..., p — 1), and the value 1 when 
all Y observations fall directly on the fitted regression surface, i.e., when Y; — Ё, for all i. 

Adding more X variables to the regression model can only increase R? and never reduce 
it, because SSE can never become larger with more X variables and SSTO is always the 
same for a given set of responses. Since R? usually can be made larger by including a larger 
number of predictor variables, it is sometimes suggested that a modified measure be used 
that adjusts for the number of X variables in the model. The adjusted coefficient of multiple 
determination, denoted by RŽ, adjusts R? by dividing each sum of squares by its associated 
degrees of freedom: 


SSE 
Б? с ЕР үз [= ЗЕ, (6.42) 
Е SSTO n— р} SSTO 

п—1 


This adjusted coefficient of multiple determination may actually become smaller when 
another X variable is introduced into the model, because any decrease in SSE may be more 
than offset by the loss of a degree of freedom in the denominator n — p. 


Chapter 6 Multiple Regression] 227 


Comments 
1. To distinguish between the coefficients of determination for simple and multiple regression, 
we shall from now on refer to the former as the coefficient of simple determination. 


2. It can be shown that the coefficient of multiple determination R? can be viewed as a coefficient 
of simple determination between the responses Y; and the fitted values Î;. 

3. Alarge value of R? does not necessarily imply that the fitted model is a useful one. For instance, 
observations may have been taken at only a few levels of the predictor variables. Despite a high R? 
in this case, the fitted model may not be useful if most predictions require extrapolations outside the 
region of observations. Again, even though R? is large, MSE may still be too large for inferences to 
be useful when high precision is required. ш 


4- 


Coefficient of Multiple Correlation 
The coefficient of multiple correlation R is the positive square root of R?: 


К = УК 6.43) 


When there is one X variable in regression model (6.19), i.e., when p—1 = 1, the coefficient 
of multiple correlation R equals in absolute value the correlation coefficient r in (2.73) for 
, Simple correlation. 


6.6 Inferences about Regression Parameters 


The least squares and maximum likelihood estimators in b are unbiased: 


E{b} = В (6.44) 

The variance-covariance matrix o? (b): 

o? {bo} o{bo,b\} --- cibo bp-1} 
c^t) = is bo) cb) uet eet (6.45) 
PXP : х : 
o{bp-1,bo}  Ofbp-1,b1} © obp} 
is given by: 

с) = o?(X'X)! (6.46) 


PXP 


The estimated variance-covariance matrix s?{b}: 


S^(bo) — 's(bo.bi) + (Ро, bp-i) 


4b) = sibi bo) dd | Sume (6.47) 
рхр ы A : 
S{bp-1, boy s{bp-1, bi} e 0 s$(bya) 
is given by: 
s?(b) = MSE(X'X) (6.48) 


PXP 


228 Part Two Multiple Linear Regression 


From s?(b], one can obtain s*{bo}, s?(bi), or whatever other variance is needed, or any 
needed covariances. 


Interval Estimation of f, 
For the normal error regression model (6.19), we have: 
by — В, 

sib.) 


Hence, the confidence limits for В; with 1 — a confidence coefficient are: 


^ (n — p) к= 0, 1,...,р- 1 (6.49) 


by 3: t(1 — 0/2; n — р)з{Ь} (6.50) 


Tests for f, ma 


Tests for f are set up in the usual fashion. To test: 


Ho: Bx = 0 


(6.51a 
He: LO = ) 
we may use the test statistic: 
by 
* = 6.51b 
sfb е 
and the decision rule: 
If [7*| < t(1 — o /2;n — p), conclude Н 
dc MN CU Lc (6.510) 


Otherwise conclude Ha 


The power of the t test can be obtained as explained in Chapter 2, with the degrees of 
freedom modified to n — p. 

As with simple linear regression, an F test can also be conducted to determine whether 
or not В; = 0 in multiple regression models. We discuss this test in Chapter 7. 


Joint Inferences 


The Bonferroni joint confidence intervals can be used to estimate several regression co- 
efficients simultaneously. If g parameters are to be estimated jointly (where g < p), the 
confidence limits with family confidence coefficient 1 — о аге: 


b, + Bs(b,) (6.52) 
where: 
B —t(1—o/2g;n— p) (6.522) 


In Chapter 7, we discuss tests concerning subsets of the regression parameters. 


Chapter 6 Multiple Regression] 229 


6.7 Estimation of Mean Response and Prediction 
of New Observation 


Interval Estimation of E{Y,} 


For given values of X,,..., Xp-1, denoted by Xj1,..., Xn,p—1, the mean response is 
denoted by E(Y,). We define the vector X,: 


1 
Xu 
X, = | (6.53) 
рх1 : 
Xn, p-1 
so that the mean response to be estimated is: 
E(Y,) = X; (6-54) 
The estimated mean response corresponding to X,, denoted by Y, is: 
Y, = Xib (6.55) 
» This estimator is unbiased: 
E{¥,} = XB = Е{Ү,} (6.56) 
and its variance is: 
041,} = o?X, (X'X)! X, (6.57) 


This variance can be expressed as a function of the variance-covariance matrix of the 
estimated regression coefficients: 


o?(f,) = X,o?^(b)X, (6.57a) 


Note from (6.572) that the variance o?(Y, ) is a function of the variances o? (b, of the regres- 
sion coefficients and of the covariances o (b, Ри} between pairs of regression coefficients, 
just as in simple linear regression. The estimated variance s? (Y, ) is given by: 


5400, } = MSE(X, (XX) !X,) = X, s?(b)X, (6.58) 
The 1 — o confidence limits for E (Y, are: 
Y, +1(1— 0/2; n — p)s{¥,} (6.59) 


Confidence Region for Regression Surface ' 

The 1 — o confidence region for the entire regression surface is an extension of the Working- 
Hotelling confidence band (2.40) for the regression line when there is one predictor variable. 
Boundary points of the confidence region at X, are obtained from: 


Y, Ws(Y,) (6.60) 


230 PattTwo Multiple Linear Regression 


where: 
W? = pF(1—o;p,n — p) (6.602) 


The confidence coefficient 1 — o provides assurance that the region contains the entire 
regression surface over all combinations of values of the X variables. 


Simultaneous Confidence Intervals for Several Mean Responses 
To estimate a number of mean responses E{Y;,} corresponding to different X, vectors with 
family confidence coefficient 1 — œ, we can employ two basic approaches: 


1. Use the Working-Hotelling confidence region bounds (6.60) for the several X, vectors 
of interest: "m 


Y, + Ws(Y,) (6.61) 


where Ў„, W, and s{¥;,} are defined in (6.55), (6.602), and (6.58), respectively. Since the 
Working-Hotelling confidence region covers the mean responses for all possible X, vec- 
tors with confidence coefficient | — o, the selected boundary values will cover the mean 
responses for the X, vectors of interest with family confidence coefficient greater than 1 —о, 

2. Use Bonferroni simultaneous confidence intervals. When g interval estimates are to 
be made, the Bonferroni confidence limits are: 


f, + Bs{¥;,} (6.62) 
where: 
B=t(1 —o/2gin — p) (6.622) 


For any particular application, we can compare the W and B multiples to see which 
procedure will lead to narrower confidence intervals. If the X, levels are not specified in 
advance but are determined as the analysis proceeds, itis better to use the Working-Hotelling 
limits (6.61) since the family for this procedure includes all possible X, levels. 


Prediction of New Observation Y, 
The 1 — о prediction limits for a new observation Ў (пез) corresponding to X;, the specified 
values of the X variables, are: 


Y, 3 t(1 —a/2;n — p)sfpred) (6.63) 
where: 
s*{pred} = MSE + s$?(Y,) = MSE(1 + X, (XX)! X) (6.63a) 


and s?{¥;,} is given by (6.58). 


Prediction of Mean of m New Observations at X, 
When m new observations are to be selected at the same levels X, and their mean Y, 8 
to be predicted, the 1 — o prediction limits are: 


Y, £1(1 — 0/2; n — p)s{predmean} (6.64) 


Chapter 6 Multiple Regression! 231 


where: 


MSE А 1 
s”{predmean} = —— + s?{¥,} = MSE (5 +X, 9 (6.64а) 
т 


Predictions of g New Observations 


Simultaneous Scheffé prediction limits for g new observations at g different levels X, with 
family confidence coefficient 1 — о are given by: 


Y, + Ss(pred) (6.65) 
where: 
S? —gF(l—o;g,n— p) x (6.65a) 


and s? (pred) is given by (6.632). 
Alternatively, Bonferroni simultaneous prediction limits can be used. For g predictions 
with family confidence coefficient 1 — a, they are: 


Ê, + Bs(pred) (6.66) 
where: 
B=t(1—a/2g;n— p) (6.66a) 


A comparison of < and B in advance of any particular use will indicate which procedure 
will lead to narrower prediction intervals. 


Caution about Hidden Extrapolations 


FIGURE 6.3 
Region of 
Observations 
on X; and X; 
Jointiy, 
Compared with 
Ranges of X, 
and X; 
Individually. 


When estimating a mean response or predicting a new observation in multiple regression, 
one needs to be particularly careful that the estimate or prediction does not fall outside the 
scope of the model. The danger, of course, is that the model may not be appropriate when it 
is extended outside the region of the observations. In multiple regression, it is particularly 
easy to lose track of this region since the levels of X1,..., Хр-1 jointly define the region. 
Thus, one cannot merely look at the ranges of each predictor variable. Consider Figure 6.3, 


X2 


Region Covered 
by X and X2 
Jointly , 


1 
| 


232 Part Two Multiple Linear Regression 


6.8 Diagnostics and Remedial Measures 


where the shaded region is the region of observations for a multiple regression application 
with two predictor variables and the circled dot represents the values (Xp, Хаз) for which 
a prediction is to be made. The circled dot is within the ranges of the predictor variables 
X, and X» individually, yet is well outside the joint region of observations. It is easy to 
spot this extrapolation when there are only two predictor variables, but it becomes much 
more difficult when the number of predictor variables is large. We discuss in Chapter 10 
a procedure for identifying hidden extrapolations when there are more than two predictor 
variables. 


Diagnostics play an important role in the development and evaluation of multiple regression 
models. Most of the diagnostic procedures for simple linear regression that we described in 
Chapter 3 carry over directly to multiple regression. We review these diagnostic procedures 
now, as well as the remedial measures for simple linear regression that carry over directly 
to multiple regression. 

Many specialized diagnostics and remedial procedures for multiple regression have also 


been developed. Some important ones will be discussed in Chapters 10 and 11. 


Scatter Plot Matrix 


FIGURE 6.4 
SYGRAPH 
Scatter Plot 
Matrix and 
Correlation 
Matrix— 
Dwaine Studios 
Example. 


Box plots, sequence plots, stem-and-leaf plots, and dot plots for each of the predictor vari- 
ables and for the response variable can provide helpful, preliminary univariate information 
about these variables. Scatter plots of the response variable against each predictor variable 
can aid in determining the nature and strength of the bivariate relationships between each of 
the predictor variables and the response variable and in identifying gaps in the data points as 
well as outlying data points. Scatter plots of each predictor variable against each of the other 
predictor variables are helpful for studying the bivariate relationships among the predictor 
variables and for finding gaps and detecting outliers. 

Analysis is facilitated if these scatter plots are assembled in a scatter plot matrix, such 
as in Figure 6.4. In this figure, the Y variable for any one scatter plot is the name found in 


(a) Scatter Plot Matrix (b) Correlation Matrix 
SALES |." "EN 
x E =. Жаз 
* E [ QA. SALES TARGTPOP . DISPOINC 
VE. TARGTPOP| *° = SALES 1.000 945 .836 
зу р TARGTPOP 1.000 781 
ee? ae DISPOINC 1.000 
$ .9 o ЕЯ 
Lu t| XQ * |] DISPOINC 


Chapter 6 Multiple Regression] 233 


its row, and the X variable is the name found in its column. Thus, the scatter plot matrix in 
Figure 6.4 shows in the first row the plots of Y (SALES) against X, (TARGETPOP) and 
X» (DISPOINC), of X, against Y and X» in the second row, and of X» against Y and X, 
in the third row. These variables are described on page 236. Alternatively, by viewing the 
first column, one can compare the plots of X, and X? each against Y, and similarly for the 
other two columns. А scatter plot matrix facilitates the study of the relationships among 
the variables by comparing the scatter plots within a row or a column. Examples in this and 
subsequent chapters will illustrate the usefulness of scatter plot matrices. 

A complement to the scatter plot matrix that may be useful attimes is the correlation ma- 
trix. This matrix contains the coefficients of simple correlation ry, ғул, . . . , ry, 1 between 
Y and each of the predictor variables, as well as all of the coefficients of simple correlation 
among the predictor variables—7;? between X, and X5, гүз between X, and X3, etc. The 
format of the correlation matrix follows that of the scatter plot matrix: 


4. 
1 ry) ry2 t FY p~l 
FYI 1 кз 0c Fip- 
(6.67) 
„# 
Күр-1 Күр-1 К2р—1 °°" 1 


Note that the correlation matrix is symmetric and that its main diagonal contains 1s because 
the coefficient of correlation between a variable and itself is 1. Many statistics packages 
provide the correlation matrix as an option. Since this matrix is symmetric, the lower (or 
upper) triangular block of elements is frequently omitted in the output. 

Some interactive statistics packages enable the user to employ brushing with scatter plot 
matrices. When a point in a scatter plot is brushed, itis given a distinctive appearance on the 
computer screen in each scatter plot in the matrix. The case corresponding to the brushed 
point may also be identified. Brushing is helpful to see whether a case that is outlying in 
one scatter plot is also outlying in some or all of the other plots. Brushing may also be 
applied to a group of points to see, for instance, whether a group of cases that does not fit 
the relationship for the remaining cases in one scatter plot also follows a distinct pattern in 
any of the other scatter plots. 


Three-Dimensional Scatter Plots 


Some interactive statistics packages provide three-dimensional scatter plots or point clouds, 
and permit spinning of these plots to enable the viewer to see the point cloud from different 
perspectives. This can be very helpful for identifying patterns that are only apparent from 
certain perspectives. Figure 6.6 Оп page 238 illustrates a three-dimensional scatter plot and 
the use of spinning. : 


2 
. 


Residual Plots 
A plot of the residuals against the fitted values is useful for assessing the appropriateness of 
the multiple regression function and the constancy of the variance of the error terms, as well 
as for providing information about outliers, just as for simple linear regression. Similarly, 


234 PartTwo Multiple Linear Regression 


a plot of the residuals against time or against some other sequence can provide diagnostic 
information about possible correlations between the error terms in multiple regression. Box 
plots and normal probability plots of the residuals are useful for examining whether the 
error terms are reasonably normally distributed. 

In addition, residuals should be plotted against each of the predictor variables. Each of 
these plots can provide further information about the adequacy of the regression function 
with respect to that predictor variable (e.g., whether a curvature effect is required for that 
variable) and about possible variation in the magnitude of the error variance in relation to 
that predictor variable. 

Residuals should also be plotted against important predictor variables that were omitted 
from the model, to see if the omitted variables have substantial additiopal effects on the 
response variable that have not yet been recognized in the regression model. Also, residuals 
should be plotted against interaction terms for potential interaction effects not included in 
the regression model, such as against X, X5, X, Хз, and X2 Хз, to see whether some or all 
of these interaction terms are required in the model. _ 

A plot of the absolute residuals or the squared residuals against the fitted values is useful 
for examining the constancy of the variance of the error terms. If nonconstancy is detected, a 
plot of the absolute residuals or the squared residuals against each of the predictor variables 
may identify one or several of the predictor variables to which the magnitude of the error 
variability is related. 


Correlation Test for Normality 
The correlation test for normality described in Chapter 3 carries forward directly to multiple 
regression. The expected values of the ordered residuals under normality are calculated 
according to (3.6), and the coefficient of correlation between the residuals and the expected 
values under normality is then obtained. Table B.6 is employed to assess whether or not 
the magnitude of the correlation coefficient supports the reasonableness of the normality 
assumption. 


Brown-Forsythe Test for Constancy of Error Variance 

The Brown-Forsythe test statistic (3.9) for assessing the constancy of the error variance can 
be used readily in multiple regression when the error variance increases or decreases with 
one of the predictor variables. To conduct the Brown-Forsythe test, we divide the data set 
into two groups, as for simple linear regression, where one group consists of cases where 
the level of the predictor variable is relatively low and the other group consists of cases 
where the level of the predictor variable 15 relatively high. The Brown-Forsythe test then 
proceeds as for simple linear regression. 


Breusch-Pagan Test for Constancy of Error Variance 
The Breusch-Pagan test (3.11) for constancy of the error variance in multiple regression is 
carried out exactly the same as for simple linearregression when the error variance increases 
or decreases with one of the predictor variables. The squared residuals are simply regressed 
against the predictor variable to obtain the regression sum of squares SSR*, and the test 
proceeds as before, using the error sum of squares SSE for the full multiple regression 
model. 


Chapter 6 Multiple Regression] 235 


When the error variance is a function of more than one predictor variable, a multiple 
regression of the squared residuals against these predictor variables is conducted and the 
regression sum of squares SSR* is obtained. The test statistic again uses SSE for the full 
multiple regression model, but now the chi-square distribution involves q degrees of free- 
dom, where q 1s the number of predictor variables against which the squared residuals are 
regressed. 


F Test for Lack of Fit 


The lack of fit F test described in Chapter 3 for simple linear regression can be carried over 
to test whether the multiple regression response function: б 


E{Y} = Bo + BX, dis + Bp-1Xp-1 


is an appropriate response surface. Repeat observations in multiple regression етер 
Observations on Y corresponding to levels of each of the X variables that are constant from 
trial to trial. Thus, with two predictor variables, repeat observations require that X, and X2 
each remain at given levels from trial to trial. 

Once the ANOVA table, shown in Table 6.1, has been obtained, SSE is decomposed into 
pure error and lack of fit components. The pure error sum of squares SSPE is obtained by first 
calculating for each replicate group the sum of squared deviations of the Y observations 
around the group mean, where a replicate group bas the same values for each of the X 
variables. Let c denote the number of groups with distinct sets of levels for the X variables, 
and let the mean of the Y observations for the jth group be denoted by Y ;. Then the sum 
of squares for the jth group is given by (3.17), and the pure error sum of squares is the sum 
of these sums of squares, as given by (3.16). The lack of fit sum of squares SSLF equals the 
difference SSE — SSPE, as indicated by (3.24). 

The number of degrees of freedom associated with SSPE is n — c, and the number of 
degrees of freedom associated with SSLF is (n — p) — (n — c) — c — p. Thus, for testing 
the alternatives: 


Ho: E{Y} = Bo + В.Х +--+ Bp Хрл 
Ha: E{Y} z Bo + Bi Xi + Bp-1Xp-1 


the appropriate test statistic is: 


(6.68a) 


SSLF | SSPE _ MSLF 
7 ¢—p n-c MSPE 


where SSLF and SSPE are given by (3. 24) and (3.16), respectively, and the appropriate 
decision rule is: 


* 


(6.68Ь) 


If F* < F(1—o;c— p,n — ç), conclude Н 
< F(1—0;c-— p,n — с), conclude Ho (6.680) 
If F* > F(1 —a;c— p,n — c), conclude H, 


Comment » 


When replicate observations are not available, an approximate lack of fit test can be conducted 
if there are cases that have similar X, vectors. These cases are grouped together and treated as 
pseudoreplicates, and the test for lack of fit 1s then carríed out using these groupings of similar 
cases. ш 


236 PartTwo Multiple Linear Regression 


Remedial Measures 


The remedial measures described in Chapter 3 are also applicable to multiple regression. 
When a more complex model is required to recognize curvature or interaction effects, the 
multiple regression model can be expanded to include these effects. For example, X2 might 
be added as a variable to take into account a curvature effect of X2, ог Х| Хз might be 
added as a variable to recognize an interaction effect between X; and X4 on the response 
variable. Alternatively, transformations on the response and/or the predictor variables can 
be made, following the principles discussed in Chapter 3, to remedy model deficiencies, 
Transformations on the response variable Y may be helpful when the distributions of the error 
terms are quite skewed and the variance of the error terms is not constant. Transformations 
of some of the predictor variables may be helpful when the effects of these variables are 
curvilinear. In addition, transformations on Y and/or the predictor variables may be helpful 
in eliminating or substantially reducing interaction effects. 

As with simple linear regression, the usefulness of potential transformations needs to be 
examined by means of residual plots and other diagnostic tools to determine whether the 
multiple regression model for the transformed data is appropriate. 


Box-Cox Transformations. The Box-Cox procedure for determining an appropriate 
power transformation on Y for simple linear regression models described in Chapter 3 
is also applicable to multiple regression models. The standardized variable W in (3.36) is 
again obtained for different values of the parameter A and is now regressed against the set 
of X variables in the multiple regression model to find that value of А that minimizes the 
error sum of squares SSE. 

Box and Tidwell (Ref. 6.1) have also developed an iterative approach for ascertaining 
appropriate power transformations for each predictor variable in a multiple regression model 
when transformations on the predictor variables may be required. 


6.9 Ап Example—Multiple Regression with Two 


Predictor Variables 


Setting 


In this section, we shall develop a multiple regression application with two predictor vari- 
ables. We shall illustrate several diagnostic procedures and several types of inferences that 
might be made for this application. We shall set up the necessary calculations in matrix 
format but, for ease of viewing, show fewer significant digits for the elements of the matrices 
than are used in the actual calculations. 


Dwaine Studios, Inc., operates portrait studios in 21 cities of medium size. These studios 
specialize in portraits of children. The company is considering an expansion into other 
cities of medium size and wishes to investigate whether sales (Y) in a community can be 
predicted from the number of persons aged 16 or younger in the community (X4) and the 
per capita disposable personal income in the community (X2). Data on these variables for 
the most recent year for the 21 cities in which Dwaine Studios is now operating are shown 
in Figure 6.5b. Sales are expressed in thousands of dollars and are labeled Y or SALES; 
the number of persons aged 16 or younger is expressed in thousands of persons and 15 


Chapter 6 Multiple Regression! 237 


FIGURE 6.5 (a) Multiple Regression Output (b) Basic Data 
SYSTAT DEP VAR: SALES N: 21 MULTIPLE R: 0.957 SQUARED MULTIPLE R: CASE X1 х2 Y FITTED RESIDUAL 
2 0.917 1 68.5 16.7 174.4 187.184 -12.7841 
Multiple ADJUSTED SQUARED MULTIPLE R: .907 STANDARD ERROR OF ESTIMATE: 2 45.2 16.8 164.4 154.229 10.1706 
s 11.0074 3 91.3 18.2 244.2 234.396 9.8037 
Regression 4 47.8 16.3 154.6 153.329 1.2715 
Output and 5 46.9 17.3 181.6 161.885 20.2151 
a 6 66.1 18.2 207.5 197.741 9. 7586 
Basic 7 49.5 15.9 152.8 152.055 0.7449 
Daía—Dwaine VARIABLE COEFFICIENT STD ERROR STD COEF TOLERANCE т P(2 TAIL) B 52.0 17.2 163.2 167.867  -4.6666 
" 9 48.9 16.6 145.4 157.738 -12.3382 
Studios CONSTANT -68.8571 60.0170 0.0000 . -1.1473 0.2663 10 38.4 16.0 137.2 136.846 0.3540 
Example. TARGTPOP 1.4546 0.2118 0.7284 0.3896 6.8682 0. 0000 11 87.9 18.3 241.9 230.387 11.5126 
DISPOINC 9.3655 4.0640 0.2511 0.3896 2.3045 0.0333 12 72.8 17.1 191.1 197.185  -6.0849 
13, 88.4 17.4 232.0 222.686 9.3143 
i 42.9 15.8 145.3 141.518 3.7816 
15. 52.5 17.8 161.1 174.213 -13.1132 
ANALYSIS OF VARIANCE 16 85.7 18.4 209.7 228.124 -18.4239 
17 41.3 16.5 146.4 145.747 0.6530 
SOURCE SUM-OF-SQUARES DF MEAN-SQUARE F-RATIO P 18 51.7 16.3 144.05 159.001 -15.0013 
19 89.6 18.1 232.6 230.987 1.6130 
REGRESSION 24015. 2821 2 12007.6411 99.1035 0.0000 20 82.7 19.1 224.1 230.316 -6.2160 
RESIDUAL 2180.9274 18 121.1626 21 52.3 16.0 166.5 157.064 9.4356 
INVERSE (X'X) 
1 2 3 
xd 1 29.7289 
2 0.0722 0.00037 
3 71.9926 -0.0056 0.1363 
labeled X; or TARGTPOP for target population; and per capita disposable personal income 
is expressed in thousands of dollars and labeled X; or DISPOINC for disposable income. 

The first-order regression model: 

Y; = Bo + Pi Xa + ВХ + & (6.69) 
with normal error terms is expected to be appropriate, on the basis of the SYGRAPH 
scatter plot matrix in Figure 6.4a. Note the linear relation between target population and 
sales and between disposable income and sales. Also note that there 1s more scatter in the 
latter relationship. Finally note that there is also some linear relationship between the two 
predictor variables. The correlation matrix in Figure 6.4b bears out these visual impressions 
from the scatter plot matrix. 

A SYGRAPH plot of the point cloud is shown in Figure 6.6a. By spinning the axes, we 
obtain the perspective in Figure 6.6b which supports the tentative conclusion that a respense 
plane may be a reasonable regression function to utilize bere. 

+ 
Basic Calculations 7 


The X and Y matrices for the Dwaine Studios example are as follows: 


2 


1 68.5. 16.7 174.4 
1 452 168 164.4 
Х=|. : : Y= | (6.70) 


1 523 16.0 166.5 


j 
i 
jd 
1 
| 
i 


238 PartTwo Multiple Linear Regression 


FIGURE 6.6 SYGRAPH Plot of Point Cloud before and after Spinning—Dwaine Studios Exa 
(a) Before Spinning (b) After Spinnin 


Sales 


250 


200 


150 


We require: 


1. 
1 1 i 1 68.5 16. 
Е 1 452 16. 
XX= | 68.5 452 oe 523|| 7 | 
16.7 168 --- 160||; _° 
1 523 16. 
which yields: 
21.0 1,3024 360.0 
Х'Х = | 1,302.4 87,707.9 22,609.2 
360.0 22,609.2 6,190.3 
2. 
174.4 
! ! 164.4 
ХҮ = | 68.5 452 --- 523 
16.7 16.8 --- 16.0 | 
166.5 
which yields: 
3,820 
X'Y = | 249,643 
66,073 


Chapter 6 Multiple Regression] 239 


=I 


21.0 1,3024 360.0 
(X'X)!—]| 1,302.4 87,707.99 22,609.2 
360.0 22,609.2 6,190.3 
Using (5.23), we obtain: 
29.7289 | .0722  —1.9926 
(XXy'!- .0722 .00037 —.0056 (6.73) 
—1.9926 —.0056 1363 


Algebraic Equivalents. Note that X'X for the first-order regression model (6.69) with 
two predictor variables is: 


А А i 1 Xu Xp 

; 1 Xa Xz 
XX= | Xn Xa e Xn M 4 : 
Xp X» · Xm f i : 

1 Xni Xn 


or: 

n XXa X Xa 
ХХ = [Уха УХ  22XnXno (6.74) 

У Хь У ХьХа LX 

For the Dwaine Studios example, we have: 
п = 21 
У) Xa 685452 = 1,302.4 
So Xa Xin = 68.5(16.7) + 45.2(16.8) + ··· = 22,609.2 
etc. 


These elements are found in (6.71). 
Also note that X'Y for the first-order regression model (6.69) with two predictor 
variables is: 


Yi 
1 p ors od Y SY, 
2 
XY =| Xa Xa ee Xu Hes L Xu (6.75) 
Хр- Хә c: Xo y У XoYi 


For the Dwaine Studios example, we have: 
5 Y; = 174.44 164.4 + -- - = 3,820 


$O Xa Y; = 68.5(174.4) + 45.2(164.4) + - - - = 249,643 
Y ХУ, = 16.7(174.4) + 16.8(164.4) + --- = 66,073 


These are the elements found in (6.72). 


240 PartTwo Multiple Linear Regression 


Estimated Regression Function 


FIGURE 6.7 
S-Plus Plot of 
Estimated 
Regression 
Surface— 
Dwaine Studios 
Exainple. 


The least squares estimates b are readily obtained by (6.25), using our basic calculations 
in (6.72) and (6.73): 
29.7289 0722  —1.9926 3,820 


b-(XX)'XY- .0722 | .00037  —.0056| | 249,643 
—1.9926 —.0056 .1363 66,073 
which yields: 
bo —68.857 
b=|{b,| = 1.455 | (6.76) 
b, 9.366 t 


and the estimated regression function is: 
Y = —68.857 + 1.455X, + 9.366X2 


A three-dimensional plot of the estimated regression function, with the responses super- 
imposed, is shown in Figure 6.7. The residuals are represented by the small vertical lines 
connecting the responses to the estimated regression surface. 

This estimated regression function indicates that mean sales are expected to increase by 
1.455 thousand dollars when the target population increases by 1 thousand persons aged 
16 years or younger, holding per capita disposable personal income constant, and that mean 
sales are expected to increase by 9.366 thousand dollars when per capita income increases 
by 1 thousand dollars, holding the target population constant. 

Figure 6.5a contains SYSTAT multiple regression output for the Dwaine Studios exam- 
ple. The estimated regression coefficients are shown in the column labeled COEFFICIENT; 
the output shows one more decimal place than we have given in the text. 

The SYSTAT output also contains the inverse of the X'X matrix that we calculated 
earlier; only the lower portion of the symmetric matrix is shown. The results are the same 
as in (6.73). 


Sales 


Chapter 6 Multiple Regression] 24 


Algebraic Version of Normal Equations. The normal equations in algebraic form fo 
the case of two predictor variables can be obtained readily from (6.74) and (6.75). We have 


(X'X)b = X'Y 


n Y Xa Y Xn» bo LY 
Уха LX EXaXa| (bh| =|} Xn% 
УУХ» У,ХоХа LX b, Y. Хь; 


from which we obtain the normal equations: P 


Y y m nbb Y Xa b Y. Xo А 
Уха = bo D> Xn bi XA Y XaXo (6.77) 
YO хоу = bo Xn chi XaXo b Y X 


Fitted Values and Residuals 


To examine the appropriateness of regression model (6.69) for the data at hand, we require 
the fitted values Y; and the residuals e; — Y; — Y;. We obtain by (6.28): 


^ 


Y = Xb 
Y 1 685 16.7 187.2 
f, 1 452 168|| 99857 154.2 
К NE 9.366 : 
fo, 1 523 16.0 157.1 
Further, by (6.29) we find: 
е=ү- ў 
е 174.4 187.2, — 12.8 
ез 164.4. 154.2 10.2 


Figure 6.5b shows the сопїршег output for the fitted values and residuals to more decimal 
places than we have presented. і 


nalysis of Appropriateness of Model . 
We begin our analysis of the appropriateness of regression model (6.69) for the Dwaine 
Studios example by considering the plot of the residuals e against the fitted values Y in 
Figure 6.8a. This plot does not suggest any systematic deviations from the response plane, 


242 Part Two Multiple Linear Regression 


FIGURE 6.8 
SYGRAPH 
Diagnostic 
Plots—Dwaine 
Studios 
Example. 


(a) Residual Plot against y (b) Residual Plot against Ху 


Residual 
Residual 


120 170 220 270 30 40 50 60 70 80 90 100 
Fitted Targtpop 


(c) Residual Plot against X; (d) Residual Plot against X1 X; 


Residual 
Residual 


15 16 17 18 19 20 500 1000 1500 2000 
Dispoinc X1X2 


nor that the variance of the error terms varies with the level of Ӯ. Plots of the residuals e 
against Ху and X; in Figures 6.8b and 6.8c, respectively, are entirely consistent with the 
conclusions of good fit by the response function and constant variance of the error terms. 

In multiple regression applications, there is frequently the possibility of interaction ef- 
fects being present. To examine this for the Dwaine Studios example, we plotted the resid- 
uals e against the interaction term X, X; in Figure 6.8d. A systematic pattern in this plot 
would suggest that an interaction effect may be present, so that a response function of the 
type: 


E{Y} = Bo + В.Х + %Х» + B3X1 X2 


‚ might be more appropriate. Figure 6.8d does not exhibit any systematic pattern; hence, 00 


interaction effects reflected by the model term f X, X? appear to be present. 


FIGURE 6.9 
Additional 
Diagnostic 
Plots—Dwaine 
Studios 
Example. 


Chapter 6 Multiple RegressionI 243 


(a) (b) 


Plot of Absolute _ Normal Probability Plot 
Residuals against Y 


Absresid 
Residual 


120 170 220 270 —30 -20 -10 0 10 20 30 
Fitted Expected 


Figure 6.9 contains two additional diagnostic plots. Figure 6.9a presents a plot of the 
absolute residuals against the fitted values. There is no indication of nonconstancy of the 
error variance. Figure 6.9b contains a normal probability plot of the residuals. The pattern 
is moderately linear. The coefficient of correlation between the ordered residuals and their 
expected values under normality is .980. This high value (the interpolated critical value in 
Table B.6 for n = 21 and a = .05 is .9525) helps to confirm the reasonableness of the 
conclusion that the error terms are fairly normally distributed. 

Since the Dwaine Studios data are cross-sectional and do not involve a time sequence, 
a time sequence plot is not relevant here. Thus, all of the diagnostics support the use of 
regression model (6.69) for the Dwaine Studios example. 


Analysis of Variance 


To test whether sales are related to target population and per capita disposable income, we 
require the ANOVA table. The basic quantities needed are: 


174.4 
164.4 
Ү'Ү = [174.4 164.4 --- 166.5] 
166.5 
— 721,072.40 
= 1 1 -- 1] [174.4 
1 1 j 11 | 164.4 
—)YJY = —[174.4. 164.4 --- 166.5] : 
n 21 . f om : : 
$ 1 1 -- Ly {166.5 
2 
= за) = 694,876.19 


21 


244 PartTwo Multiple Linear Regression 
Thus: 
| 
SSTO = ҮҮ — (2) Y'JY = 721,072.40 — 694,876.19 = 26,196.21 
n 


and, from our results in (6.72) and (6.76): 
SSE = ҮҮ — b'X'Y 
3,820 
= 721,072.40 — [—68.857 1.455 9.366] | 249,643 
66,073 
— 721,072.40 — 718,891.47 — 2,180.93 


Finally, we obtain by subtraction: 
SSR — SSTO — SSE — 26,196.21 — 2,180.93 — 24,015.28 


These sums of squares are shown in the SYSTAT ANOVA table in Figure 6.5a. Also 
shown in the ANOVA table are degrees of freedom and mean squares. Note that three 
regression parameters had to be estimated; hence, 21 — 3— 18 degrees of freedom are 
associated with SSE. Also, the number of degrees of freedom associated with SSR is 
2—the number of X variables in the model. 


Test of Regression Relation. To test whether sales are related to target population and 
per capita disposable income: 

Ho: Pi = 0 and p» =0 

Ha: not both £, and f2 equal zero 


we use test statistic (6.39b): 


MSR _ 12,007.64 
MSE 121.1626 


This test statistic is labeled F-RATIO in the SYSTAT output. For о = .05, we require 
F(.95; 2. 18) = 3.55. Since F* = 99.1 > 3.55, we conclude H,, that sales are related to 
target population and per capita disposable income. The P-value for this test is .0000, as 
shown in the SYSTAT output labeled P. 

Whether the regression relation is useful for making predictions of sales or estimates of 
mean sales still remains to be seen. 


Е* = = 99.1 


Coefficient of Multiple Determination. For our example, we have by (6.40): 
3 SSR — 2401528 _ 


R= = = 
SSTO | 26,196.21 


Thus, when the two predictor variables, target population and per capita disposable income, 
are considered, the variation in sales is reduced by 91.7 percent. The coefficient of multiple 
determination is shown in the SYSTAT output labeled SQUARED MULTIPLE R. Also 
shown in the output is the coefficient of multiple correlation R = .957 and the adjusted 
coefficient of multiple determination (6.42), R? = .907, which is labeled in the output 


.917 


Chapter 6 Multiple Regression] 245 


ADJUSTED SQUARED MULTIPLE R. Note that adjusting for the number of predictor 
variables in the model had only a small effect here on Ё?. 


Estimation of Regression Parameters 


Dwaine Studios is not interested in the parameter fo since it falls far outside the scope of 
the model. It is desired to estimate 6, and f2 jointly with family confidence coefficient .90. 
We shall use the simultaneous Bonferroni confidence limits (6.52). 

First, we need the estimated variance-covariance matrix S? (b): 


s?(b) = MSE(XX) ! 
MSE is given in Figure 6.5a, and (X'X)-! was obtained in (6.73). Hence: 


29.7280 | .0722  —1.9926 
121.1626 .0722 .00037 | —.0056 
—1.9926  —.0056 .1363 


s’{b} 


(6.78) 
3,6020 8.748  —24143 


8.748 .0448 —.679 
—241.43  —.679 16.514 


The two estimated variances we require are: 


52461) = .0448 ог s(b,] = .212 
524} = 16.514 ог s{bz}= 4.06 
These estimated standard deviations are shown in ће SYSTAT output in Figure 6.5a, labeled 


STD ERROR, to four decimal places. 
Next, we require for g — 2 simultaneous estimates: 


B = i[1 — .10/2(2); 18] = 1(.975; 18) = 2.101 


The two pairs of simultaneous confidence limits therefore are 1.455 + 2.101(.212) and 
9.366 + 2.101(4.06), which yield the confidence intervals: 


1.01 x £, x 1.90 
-84 < В < 17.9 


With family confidence coefficient .90, we conclude that Ву falls between 1.01 and 1.90 
and that В2 falls between .84 and 17.9. ' 

Note that the simultaneous Confidence intervals suggest that both В, and fj; are positive, 
which is in accord with theoretical expectations that sales should increase with higher target 
population and higher per capita disposable income, the other variable being held constant. 


Estimation of Mean Response : 
Dwaine Studios would like to estimate expected (mean) sales in cities with target population 
Xn, = 65.4 thousand persons aged 16 years or younger and per capita disposable income 


246 PartTwo Multiple Linear Regression 


Хро = 17.6 thousand dollars with а 95 percent confidence interval. We define: 


I 
17.6 


The point estimate of mean sales is by (6.55): 


—68.857 
Y,=X,b=[1 65.4 17.6] 1.455 | = 191.10,» 
9.366 


The estimated variance by (6.58), using the results in (6.78), is: 


w 


5°{#„} = X,s^(b)X, 


3,602.07 8.748 | —241.43 1 
= 1 654 17.6] 8.748 .0448 —.679 | | 65.4 
—241.43  —.679 16.514 | | 17.6 
= 7.656 
ог: 
s(£,) = 2.77 


For confidence coefficient .95, we need t(.975; 18) = 2.101, апа we obtain by (6.59) 
the confidence limits 191.10 + 2.101(2.77). The confidence interval for E(Y,) therefore 
16: 


185.3 < E(Y,] x 196.9 


Thus, with confidence coefficient .95, we estimate that mean sales in cities with target 
population of 65.4 thousand persons aged 16 years or younger and per capita disposable 
income of 17.6 thousand dollars are somewhere between 185.3 and 196.9 thousand dollars. 
Dwaine Studios considers this confidence interval to provide information about expected 
(average) sales in communities of this size and income level that is precise enough for 
planning purposes. 


Algebraic Version of Estimated Variance s?{¥;,}. Since by (6.58): 
250 рр 2 
S {Yn} = X,S (b)X, 
it follows for the case of two predictor variables in a first-order model: 


549, } = s?(bo] + X2,87 {bi} + X255? {bo} + 2Xnis{bo, bi] 
+2Xn2S{bp, b2} + 2X Хора, b2} (6.79) 


Chapter 6 Multiple Regression] 247 


Prediction Limits for New Observations 


Dwaine Studios as part of a possible expansion program would like to predict sales for two 
new cities, with the following characteristics: 


CityA CityB 


Xm 65.4 53.1 
Xm 17.6 17.7 


Prediction intervals with a 90 percent family confidence coefficient are desired. Note that 
the two new cities have characteristics that fall well within the pattern of the 2l cities on 
which the regression analysis is based. 

To determine which simultaneous prediction intervals are best here, we find S as given 
in (6.652) and B as given in (6.662) for g = 2 and 1 — а = .90: 


52 = 2F(.90;2, 18) = 2(2.62) = 5.24 S = 2.29 
and: 
B = t[1 — .10/2(2); 18] = £(.975; 18) = 2.101 


Hence, the Bonferroni limits are more efficient here. 
For city A, we use the results obtained when estimating mean sales, since the levels of 
the predictor variables are the same here. We have from before: 


Ê, = 191.10 s?{¥,}=7.656 MSE = 121.1626 
Hence, by (6.63a): 
s?(pred) = MSE + s?{¥;,} = 121.1626 + 7.656 = 128.82 


s{pred} = 11.35 
In similar fashion, we obtain for city B (calculations not shown): 
Ў, = 174.15  s{pred} = 11.93 


We previously found that the Bonferroni multiple is B = 2.101. Hence, by (6.66) the simul- 
taneous Bonferroni prediction limits with family confidence coefficient .90 are 191.10 + 
2.101 (11.35) and 174.15 + 2.104 (11.93), leading to the simultaneous prediction intervals: 


City A: 167.3 < Уем) < 214.9 
City B: 149.1 < Ypmew < 199.2* 


With family confidence coefficient .90, we predict that sales in the two cities will be within 
the indicated limits. Dwaine Studios considers these prediction limits to be somewhat useful 
for planning purposes, but would prefer tighter intervals for predicting sales for a particular 
city. А consulting firm has been engaged to see if additional or alternative predictor variables 
can be found that will lead to tighter prediction intervals. 


248 PartTwo Multiple Linear Regression 


Note incidentally that even though the coefficient of multiple determination, R? — .917, 
is high, the prediction limits here are not fully satisfactory. This serves as another reminder 
that a high value of R? does not necessarily indicate that precise predictions can be made, 


Cited 
Reference 


6.1. Box, С. E. P., and P. W. Tidwell. “Transformations of the Independent Variables,” Technometrics 


4 (1962), pp. 531-50. 


Problems 


6.1. 


6.3. 


6.4. 


6.5. 


6.6. 


Set up the X matrix and B vector for each of the following regression models (assume i = 


a. Y; = Po + B£i Xj T f^ Xi Xi + &i we 
b. log ¥; = fo + £i Xi + 8 Хро + €i 


. Set up the X matrix and f) vector for each of the following regression models (assume i = 


1,25) 

a. Y; = fyXi + Хо + ВХ + & 5 

b. VY; = Bo + Bi Xii + fo logy Xia + е; 

A student stated: “Adding predictor variables to a regression model can never reduce R2, so we 
should include all available predictor variables in the model.” Comment. 

Why is it not meaningful to attach a sign to the coefficient of multiple correlation К, although 
we do so for the coefficient of simple correlation гү»? 


Brand preference. In a small-scale experimental study of the relation between degree of brand 
liking (Y) and moisture content (Ху) and sweetness (X5) of the product, the following results 
were obtained from the experiment based on a completely randomized design (data are coded): 


i: 1 2 3 hat 14 15 16 


Хи: 4 4 4 Ae 10 10 10 
Xiz: 2 4 2 es 4 2 4 
Yi: 64 73 61 i 95 94 100 


a. Obtain the scatter plot matrix and the correlation matrix. What information do these diag- 
nostic aids provide here? 

b. Fit regression model (6.1) to the data. State the estimated regression function. How is br 
interpreted here? 

c. Obtain the residuals and prepare а box plot of the residuals. What information does this plot 
provide? 

d. Plot the residuals against P, X,, X2, and X, Xz on separate graphs. Also prepare a normal 
probability plot. Interpret the plots and summarize your findings. 

e. Conduct the Breusch-Pagan test for constancy of the error variance, assuming log о? = 
yo + yi Xii + у» Хот use о = .01. State the alternatives, decision rule, and conclusion. 

f. Conduct a formal test for lack of fit of the first-order regression function; use œ = .01. State 
the alternatives, decision rule, and conciusion. 

Refer to Brand preference Problem 6.5. Assume that regression model (6.1) with independent 

normal error terms is appropriate. 

a. Test whether there is a regression relation, using о = .01. State the alternatives, decision 
rule, and conclusion. What does your test imply about £; and f2? 


6.7. 


6.8. 


*6.9. 


*6.10. 


Chapter 6 Multiple Regression] 249 


b. Whatis the P-value of the test in part (a)? 


c. Estimate £; and £; jointly by the Bonferroni procedure, using a 99 percent family confidence 
coefficient. Interpret your results. 


Refer to Brand preference Problem 6.5. 

a. Calculate the coefficient of multiple determination R?. How is it interpreted here? 

b. Calculate the coefficient of simple determination R? between Y; and $;. Does it equal the 
coefficient of multiple determination in part (a)? 

Refer to Brand preference Problem 6.5. Assume that regression model (6.1) with independent 

normal error terms is appropriate. 

a. Obtain an interval estimate of E(Y,) when Xn = 5and Xm = 4. Use a 99 percent confidence 
cOefficient. Interpret your interval estimate. % 

b. Obtain a prediction interval for a new observation Улоу When Ж = 5 and Xm =4. Use a 
99 percent confidence coefficient. 

Grocery retailer. A large, national grocery retailer tracks productivity and costs ofkts facilities 

closely. Data below were obtained from a single distribution center for a one-year period. Each 

data point for each variable represents one week of activity. The variables included are the 

number of cases shipped (X4), the indirect costs of the total labor hours as a percentage (X2), 

a qualitative predictor called holiday that is coded 1 if the week has a holiday and О otherwise 

(Хз), and the total labor hours (Y). 


i 1 2 3 — 50 51 52 
Xn: 305,657 328,476 317,164 ... 290,455 411,750 292,087 
Xi2: 7.17 6.20 4.61 ... 7.99 7.83 7.77 
Xia: 0 0 0 РРА 0 0 0 

Yj: 4264 4496 4317 ns 4499 4186 4342 


a. Prepare separate stem-and-leaf plots for the number of cases shipped X;; and the indirect 
cost of the total hours X;2. Are there any outlying cases present? Are there any gaps in the 
data? 

b. The cases are given in consecutive weeks. Prepare a time plot for each predictor variable. 

- What do the plots show? 

c. Obtain the scatter plot matrix and the correlation matrix. What information do these diag- 
nostic aids provide here? 

Refer to Grocery retailer Problem 6.9. 

a. Fit regression model (6.5) to the data for three predictor variables. State the estimated 
regression function. How are bi, bz, and Рз interpreted here? 

b. Obtain the residuals and prepare a;box plot of the residuals. What information does this plot 
provide? - 

c. Plottheresiduals against T, X, X2, X4, and X 1 X2 on separate graphs. Also prepare a normal 
probability plot. Interpret the plots and summarize your findings. 

d. Preparea time plot of the residuals. Is there any indication4hat the error terms are correlated? 
Discuss. А 

e. Divide the 52 cases into two groups, placing the 26 cases with the smallest fitted values 
Ў, into group 1 and the other 26 cases into group 2. Conduct the Brown-Forsythe test for 
constancy of the error variance, using œ = .01. State the decision rule and conclusion. 


250 PartTwo Multiple Linear Regression 


*6.11. 


*6.12. 


*6.13. 


*6.14. 


*6.15. 


Refer to Grocery retailer Problem 6.9. Assume that regression model (6.5) for three predictor 

variables with independent normal error terms is appropriate. 

а. Test whether there is a regression relation, using level of significance .05. State the alterna- 
tives, decision rule, and conclusion. What does your test result imply about 1, 8, and 8? 
What is the P-value of the test? 

b. Estimate £j, and £; jointly by the Bonferroni procedure, using a 95 percent family confidence 
coefficient. Interpret your results. 

c. Calculate the coefficient of multiple determination R?. How is this measure interpreted here? 

Refer to Grocery retailer Problem 6.9. Assume that regression model (6.5) for three predictor 

variables with independent normal error terms is appropriate. 


a. Managementdesires simultaneous interval estimates of the total labor hours forthe following 


n 


five typical weekly shipments: © 
1 2 3 4 5 
X 302,000 245,000 280,000 350,000 295,000 
X2: 7.20 7.40 є 6.90 7.00 6.70 
Хз: 0 0 0 0 1 


Obtain the family of estimates using a 95 percent family confidence coefficient. Employ the 
Working-Hotelling or the Bonferroni procedure, whichever is more efficient. 

b. For the data in Problem 6.9 on which the regression fit is based, would you consider а 
shipment of 400,000 cases with an indirect percentage of 7.20 on a nonholiday week to be 
within the scope of the model? What about a shipment of 400,000 cases with an indirect 
percentage of 9.9 on a nonholiday week? Support your answers by preparing a relevant plot, 


Refer to Grocery retailer Problem 6.9. Assume that regression model (6.5) for three predictor 
variables with independent normal error terms is appropriate. Four separate shipments with the 
following characteristics must be processed next month: 


1 2 3 4 
Xy 230,000 250,000 280,000 340,000 
X2: 7.50 7.30 7.10 6.90 
X3: 0 0 0 0 


Management desires predictions of the handling times for these shipments so that the actual 
handling times can be compared with the predicted times to determine whether any are out of 
line. Develop the needed predictions, using the most efficient approach and a family confidence 
coefficient of 95 percent. 

Refer to Grocery retailer Problem 6.9. Assume that regression model (6.5) for three predictor 
variables with independent normal error terms is appropriate. Three new shipments are to be 
received, each with Х = 282,000, Х„› = 7.10, and X44 = 0. 


a. Obtain a 95 percent prediction interval for the mean handling time for these shipments. 


b. Convert the interval obtained in part (a) into a 95 percent prediction interval for the total 
labor hours for the three shipments. 


Patient satisfaction. A hospital administrator wished to study the relation between patient 
satisfaction (Y) and patient's age (Х|, in years), severity of illness (Хә, an index), and anxiety 


*6.16. 


*6.17. 


6.18. 


Chapter 6 Multiple Regression] 251 


level (Xs, an index). The administrator randomly selected 46 patients and collected the data 
presented below, where larger values of Y, X2, and Хз are, respectively, associated with more 
satisfaction, increased severity of illness, and more anxiety. 


i 1 2 3 225 44 45 46 
Ха: 50 36 40 ts 45 37 28 
Xiz: 51 46 48 “з 51 53 46 
Хез: 2.3 2.3 2.2 А 2.2 2.1 1.8 
Y; 48 57 66 25 68 59 92 


а. Prepare a stem-and-leaf plot for each of the predictor variables. Are any noteworthy features 
revealed by these plots? 

b. Obtain the scatter plot matrix and the correlation matrix. Interpret these andbstate your 
principal findings. 

c. Fit regression model (6.5) for three predictor variables to the data and state the estimated 
regression function. How is b2 interpreted here? 

d. Obtain the residuals and prepare a box plot of the residuals. Do there appear to be any 
outliers? 

e. Plot the residuals against Ӯ, each of the predictor variables, and each two-factor interaction 
term on separate graphs. Also prepare a normal probability plot. Interpret your plots and 
summarize your findings. 

f. Can you conduct a formal test for lack of fit here? 

g. Conduct the Breusch-Pagan test for constancy of the error variance, assuming log o? = 
yod-JAXn d р Хо + yX; use a=.01. State the alternatives, decision rule, and 
conclusion. 

Refer to Patient satisfaction Problem 6.15. Assume that regression model (6.5) for three 

predictor variables with independent normal error terms is appropriate. 


a. Test whether there is a regression relation; use œ = .10, State the alternatives, decision rule, 
and conclusion. What does your test imply about £1, 62, and £5? Whatis the P-value of the 
test? 

b. Obtain joint interval estimates of £1, £5», and 83, using a 90 percent family confidence 
coefficient. Interpret your results. 


c. Calculate the coefficient of multiple determination. What does it indicate here? 


Refer to Patient satisfaction Problem 6.15. Assume that regression model (6.5) for Mres 
predictor variables with independent normal error terms is appropriate. 


a. Obtain an interval estimate of the mean satisfaction when Xp, = 35, Хро = 45, and X43 —2.2. 
Use a 90 percent confidence coefficient. Interpret your confidence interval. 


b. Obtain a prediction interval for a new patient's satisfaction when Хр = 35, X55 = 45, and 
Xr = 2.2. Use a 90 percent confidence coefficient. Interpret your prediction interval. 


Commercial properties. A commercial real estate company evaluates vacancy rates, square 
footage, rental rates, and operating expenses for commercial properties in a large metropolitan 
area in order to provide clients with quantitative information upon which to make rental deci- 
sions. The data below are taken from 81 suburban commercial properties that are the newest, 
best located, most attractive, and expensive for five specific geographic areas. Shown here are 


252 PartTwo Multiple Linear Regression 


tH 


ie age (X1), operating expenses and taxes ( X5), vacancy rates (X3), total square footage (X4), 


and rental rates (У). 


1 2 3 e$ 79 80 81 


6.19. 


К 6.20. 


1 14 16 Jas 15 11 14 
5.02 8.19 3.00  ... 11.97 11.27 12.68 
0.14 0.27 0 uot 0.14 0.03 0.03 
123,000 104,079 39,998 ... 254,700 434,746 201,930 
13.50 12.00 10.50  ... 15.00 15.25 14.50 


Prepare a stem-and-leaf plot for each predictor variable. What infoffmation do these plots 
provide? 

Obtain the scatter plot matrix and the correlation matrix. Interpret these and state your 
principal findings. 

Fit regression model (6.5) for four predictor«variables to the data. State the estimated 
regression function. 

Obtain the residuals and prepare a box plot of the residuals. Does the distribution appear to 
be fairly symmetrical? 

Plot the residuals against Ў, each predictor variable, and each two-factor interaction term on 
separate graphs. Also prepare a normal probability plot. Analyze your plots and summarize 
your findings. 

. Can you conduct a formal test for lack of fit here? 

Divide the 81 cases into two groups, placing the 40 cases with the smallest fitted values Ӯ; 
into group | and the remaining cases into group 2. Conduct the Brown-Forsythe test for 
constancy of the error variance, using & = .05. State the decision rule and conclusion. 


Refer to Commercial properties Problem 6.18. Assume that regression model (6.5) for four 
predictor variables with independent normal error terms is appropriate. 


a. 


с. 


Test whether there is a regression relation; use о = .05. State the alternatives, decision rule, 
and conclusion. What does your test imply about 6;, 62, £3, and B4? What is the P-value 
of the test? 

Estimate В), f», 63, and £4 jointly by the Bonferroni procedure, using a 95 percent family 
confidence coefficient. Interpret your results. 


Calculate R? and interpret this measure. 


Refer to Commercial properties Problem 6.18. Assume that regression model (6.5) for four 
predictor variables with independent normal error terms is appropriate. The researcher wishes 
to obtain simultaneous interval estimates of the mean rental rates for four typical properties 
specified as follows: 


1 2 3 4 
Xi: 5.0 6.0 14.0 12.0 
X2: 8.25 8.50 11.50 10.25 
Хз: 0 0.23 0.11 0 
Xa: 250,000 270,000 300,000 310,000 


Obtain the family of estimates using a 95 percent family confidence coefficient. Employ the 
most efficient procedure. 


Chapter 6 Multiple Regression! 253 


6.21. Refer to Commercial properties Problem 6.18. Assume that regression model (6.5) for four 
predictor variables with independent normal error terms is appropriate, Three properties with 
the following characteristics did not have any rental information available. 

1 2 3 
Xi: 4.0 6.0 12.0 
X2: 10.0 11.5 12.5 
X3: 0.10 0 0.32 
Xa: 80,000 120,000 340,000 

Develop separate prediction intervals for the rental rates of these properties, using a 95 per- 

cent statement confidence coefficient in each case. Can the rental rates of tress three prop- 

erties be predicted fairly precisely? What is the family confidence level for ће Set of three 
predictions? 

Exercises 6.22. For each of the following regression models, indicate whether it is a general linear regres- 
sion model. If it is not, state whether it can be expressed in the form of (6.7) by a suitable 
transformation: 

а. Y; = Bo + bi Xa + Bo logy Хо + BXA + £i 

b. Y; = & exp(Bo + BiXn + AXR) 

c. Y; = logy (Ai Xn) + Xn + ё 

d. Y; = fo exp(fiXi) + & 

e, Y; = [1 + exp(fo + fiXa + e)! 

6.23. (Calculus needed.) Consider the multiple regression model: 

Y; = В Xn + ВХ + е: i=1,...,n 

where е £; are uncorrelated, with E{e;} = 0 and o7{¢;} = o°. 

a. State the least squares criterion and derive the least squares estimators of В; and £f». 

b. Assuming that the а; are independent normal random variables, state the likelihood function 
and obtain the maximum likelihood estimators of £; and 8». Are these the same as the least 
squares estimators? 

6.24. (Calculus needed.) Consider the multiple regression model: 

Y; = Bo + BiXn +,6Х2 + BsXi2 + si i=1,...,n 

where the s; are independent N(0, с?). 

a. State the least squares criterion and derive the least squares normal equations. 

b. State the likelihood function and explain why the maximum likelihood estimators will be 
the same as the least squares estimators. 

6.25. An analyst wanted to fit the regression model Y; = fo + £1 Xi + ВХ, + £4 Хз + &, 
i=1,...,n, by the method of least squares when it is known that 82 = 4. How can the analyst 
obtain the desired fit by using a multiple regression computer program? 

6.26. For regression model (6.1), show that the coefficient of simple determination between Y; and 


Y; equals the coefficient of multiple determination R?. 


254 PartTwo Multiple Linear Regression 


6.27. 


in a small-scale regression study, the following data were obtained: 


i: 1 2 3 4 5 6 


Xn $ 7 4 16 3 21 8 
Xiz: 33 41 7 49 5 31 
Yi: 42 33 75 28 91 55 


Assume that regression model (6.1) with independent normal error terms is appropriate. Using 
matrix methods, obtain (a) b; (b) e; (c) Н; (d) SSR; (e) s? (b); (f) Ў, when Xn) = 10, Хю = 30; 
в) 5?(£;) when Х = 10, Хы = 30. 


Projects 


6.28. 


6.30. 


E 
Refer to the CDI data set in Appendix C.2. You have been asked to evaluate two alternative 
models for predicting the number of active physicians (Y) in a CDI. Proposed model I includes 
as predictor variables total population (Х|), land area (Хэ), and total personal income (X3). 
Proposed model Il includes as predictor variables population density (Х|, total population 
divided by land area), percent of population greater’than 64 years old (X2), and total personal 


income (X3). 

a. Preparea stem-and-leaf plot for each of the predictor variables. What noteworthy information 
is provided by your plots? 

b. Obtain thescatter plot matrix and the correlation matrix for each proposed model. Summarize 
the information provided. 

c. For each proposed model, fit the first-order regression model (6.5) with three predictor 
variables. 

d. Calculate R? for each model. Is one model clearly preferable in terms of this measure? 

e. For each model, obtain the residuals and plot them against Ў, each of the three predictor 
variables, and each of the two-factor interaction terms. Also prepare a normal probability 
plot for each of the two fitted models. Interpret your plots and state your findings. Is one 
model clearly preferable in terms of appropriateness? 


. Refer to the CDI data set in Appendix C.2. 


a. For each geographic region, regress the number of serious crimes in a CDI (Y) against 
population density (Х|, total population divided by land area), per capita personal income 
(X2), and percent high school graduates (X3). Use first-order regression model (6.5) with 
three predictor variables. State the estimated regression functions. 

b. Are the estimated regression functions similar for the four regions? Discuss. 

c. Calculate MSE and R? for each region. Are these measures similar for the four regions? 
Discuss. 

d. Obtain the residuals for each fitted model and prepare a box plot of the residuals for each 
fitted model. Interpret your plots and state your findings. 


Refer to the SENIC data set in Appendix C.1. Two models have been proposed for predicting the 

average length of patient stay in a hospital (У). Model I utilizes as predictor variables age (Хү), 

infection risk (Хэ), and available facilities and services (Хз). Model П uses as predictor variables 

nuniber of beds (X), infection risk (X5), and available facilities and services (Хз). 

a. Prepare a stem-and-leaf plot for each of the predictor variables. What information do these 
plots provide? 

b. Obtain the scatter plot matrix and the correlation matrix for each proposed model. Interpret 
these and state your principal findings. 


Chapter 6 Multiple Regression] 255 


c. For each of the two proposed models, fit first-order regression model (6.5) with three pre- 
dictor variables. 

d. Calculate R? for each model. Is one model clearly preferable in terms of this measure? 

e. For each model, obtain the residuals and plot them against Ў, each of ће three predictor 
variables, and each of the two-factor interaction terms. Also prepare a normal probability 
plot of the residuals for each of the two fitted models. Interpret your plots and state your 
findings. Is one model clearly more appropriate than the other? 

6.31. Refer to the SENIC data set in Appendix C.1. 


a. For each geographic region, regress infection risk (Y) against the predictor variables age 
(X), routine culturing ratio (Хз), average daily census (X3),sand available facilities and 
services (X4). Use first-order regression model (6.5) with four predictor variables. State the 
estimated regression functions. 

b. Are the estimated regression functions similar for the four regions? Discuss. 1 

c. Calculate MSE and R? for each region. Are these measures similar for the four regions? 
Discuss. 

d. Obtain the residuals for each fitted model and prepare a box plot of the residuals for each 
fitted model. Interpret the plots and state your findings. 


Chapter 


Multiple Regression П 


э” 
"P 


In this chapter, we take up some specialized topics that are unique to multiple regression. 
These include extra sums of squares, which are useful for conducting a variety of tests about 
the regression coefficients, the standardized version of the multiple regression model, and 
multicollinearity, a condition where the predictor variables are highly correlated. 


7.1 Extra Sums of Squares 


Basic Ideas 


Example 


256 


An extra sum of squares measures the marginal reduction in the error sum of squares 
when One or several predictor variables are added to the regression model, given that other 
predictor variables are already in the model. Equivalently, one can view an extra sum of 
squares as measuring the marginal increase in the regression sum of squares when one or 
several predictor variables are added to the regression model. 

We first utilize an example to illustrate these ideas, and then we present definitions of 
extra sums of squares and discuss a variety of uses of extra sums of squares In tests about 
regression coefficients. 


Table 7.1 contains a portion of the data for a study of the relation of amount of body fat 
(Y) to several possible predictor variables, based on a sample of 20 healthy females 25- 
34 years old. The possible predictor variables are triceps skinfold thickness (X1), thigh 
circumference (X5), and midarm circumference (Хз). The amount of body fat in Table 7.1 
for each of the 20 persons was obtained by a cumbersome and expensive procedure requiring 
the immersion of the person in water. It would therefore be very helpful if a regression 
model with some or all of these predictor variables could provide reliable estimates of the 
amount of body fat since the measurements needed for the predictor variables are easy to 
obtain. 

Table 7.2 contains some of the main regression results when body fat (Y) is regressed 
(1) on triceps skinfold thickness (Х ,) alone, (2) on thigh circumference (Хэ) alone, (3) on 
X, and X» only, and (4) on all three predictor variables. To keep track of the regression 
model that is fitted, we shall modify our notation slightly. The regression sum of squares 
when X; only is in the model is, according to Table 7.2a, 352.27. This sum of squares 
will be denoted by SSR(X,). The error sum of squares for this model will be denoted by 
SSE( X); according to Table 7.2a it is SSE(X,) = 143.12. 


Chapter 7 Multiple Regression П 257 


Similarly, Table 7.2c indicates that when X; and X; are in the regression model, 
the regression sum of squares is SSR(X4, X5) = 385.44 and the error sum of squares is 
SSE (X1, X5) = 109.95. 

Notice that the error sum of squares when X, and X» are in the model, SSE(X,, X2) = 
109.95, is smaller than when the model contains only X1, SSE(X,) = 143.12. The difference 
is called an extra sum of squares and will be denoted by SSR(X2|X1): 


SSR(X2|X1) = SSE(X1) — SSE(Xi, X5) 
= 143.12 — 109.95 = 33.17 


TABLE 7.1 


Я Triceps | | .. Thigh Midarm. 
Basic Subject ^ Skinfold Thickness Circumference Circumference Body Fat 
Data—Body i Xi Xa Xs Y; 
Fat Example. d es at t 
1 19.5 43.1 29.1 11.9 
2 -24.7 49.8: 282 22.8 
3 30.7 51 .9 37.0 18.7 
А 18 30.2 58.6 24.6 25.4 
; 19 22.7 48.2 271 14.8 
20 25.2 51.0 27:5 21.1 
кин (а) Regression of Y on Xy — 
Results for soo ү zl 4 96 + -8572X Я 
Several Fitted ^ Source of E 
Models—Body Variation SS df MS 
Fat Example. Regression 352.27 1 352.27 
Error 143.12 18 7.95 
Total 495.39. 19 
Estimated Estimated 
Variable Regression Coefficient Standard Deviation t* 
X bı = .8572. s(b1) =-1288` 6.66 


(b) Regression of Y on X; 
f = —23.634 + .8565X; 


Source of | 
Variation SS А ағ MS 
Regression 381.97 1 381.97 
Error 113.42 А 18 6.30 
Total 495.39 19 

Estimated Estimated 
Variable Regression: Coefficient Staridard. Deviation t* 
X b; = .8565 ‘s{b2} = .1100 7.79 


(continued ) 


258 PartTwo Multiple Linear Regression 


TABLE 7.2 
(Continued). 


(c) Regression of Y on X, and X; 
y = —19.174 + .2224X, + .6594X> 


Source of 
Variation SS df MS 
Regression 385.44 2 192.72 
Error 109.95 17 6.47 
Total 495.39 19 

Estimated Estimated 
Variable Regression Coefficient Standard Deviation t 
X b, = .2224 3161} = -3034 73 
X2 b, = .6594 s{b2} = .2912 2.26 

(d) Regression of Y on X1, X2, and X3 
Ӯ = 117.08 + 4.334 X — 2.857 X; — 2.186 X3 

Source of 
Variation $$ df MS 
Regression 396.98 3 132.33 
Error 98.41 16 6.15 
Total 495.39 19 

Estimated Estimated 
Variable Regression Coefficient Standard Deviation t 
X Б = 4.334 s{b,} = 3.016 1.44 
X2 b2 = —2.857 s{b2} = 2.582 —1.11 
X3 бз = —2.186 s{b3} = 1.596 —1.37 


This reduction in the error sum of squares is the result of adding X» to the regression model 
when X, is already included in the model, Thus, the extra sum of squares SSR(X2|X1) 
measures the marginal effect of adding X» to the regression model when X, is already in 
the model. The notation SSR(X2|X) reflects this additional or extra reduction in the error 
sum of squares associated with X», given that X, 15 already included in the model. 

The extra sum of squares SSR(X»| X,) equivalently can be viewed as the marginal increase 
in the regression sum of squares: 


SSR(X2|X1) = SSR(Xi, Хә) — SSR(X1) 
= 385.44 — 352.27 = 33.17 
The reason for the equivalence of the marginal reduction in the error sum of squares and 


the marginal increase in the regression sum of squares is the basic analysis of variance 
identity (2.50): 


SSTO = SSR + SSE 


Since SSTO measures the variability of the Y; observations and hence does not depend on 
the regression model fitted, any reduction in SSE implies an identical increase in SSR. 


Definitions 


Chapter 7 Multiple Regression П 259 


We can consider other extra sums of squares, such as the marginal effect of adding Хз to 
the regression model when X, and X» are already in the model. We find from Tables 7.2c 
and 7.2d that: 

SSR(X3|X,, Хә) = SSE(X), X5) — SSE(X|, Хә, Хз) 
— 109.95 — 98.41 — 11.54 
or, equivalently: 
SSR(X3|X;, X2) = SSR(Xi, Хә, Хз) — SSR(X,, Хә) 
= 396.98 — 385.44 = 11.54 „ 


We can even consider the marginal effect of adding several variables, such as adding 
both X2 and Хз to the regression model already containing X, (see Tables 7.2a ш 7.2д): 


SSR(X5, X4|X1) = SSE(X,) — SSE(X,, Хә, Хз) 
= 143.12 — 98.41 — 44.71 
or, equivalently: 
SSR(X2, X4| X1) = SSR(X1, X2, Хз) — SSR(X)) 
= 396.98 — 352.27 = 44.71 


We assemble now our earlier definitions of extra sums of squares and provide some addi- 
tional ones. As we noted earlier, an extra sum of squares always involves the difference 
between the error sum of squares for the regression model containing the X variable(s) 
already in the model and the error sum of squares for the regression model containing both 
the original X variable(s) and the new X variable(s). Equivalently, an extra sum of squares 
involves the difference between the two corresponding regression sums of squares. 

Thus, we define: 


SSR(X,[X2) = SSE(X2) — SSE(X|, Хә) (7.1a) 
or, equivalently: 

SSR(X\|X2) = SSR(X1, X2) — SSR(X2) (7.1b) 
If X? is the extra variable, we define: | 

SSR(X2|X1) = SSE(X,) — SSE(X,, X2) (7.2a) 
or, equivalently: - i 

SSR(X2|X1) = SSR(X,, Хә) — SSR(X1) (7.2b) 


Extensions for three or more variables are straightforward. For example, we define: 


SSR(X3|X1, Xo) = SSE(X,, Хо) — SSE(X,, X2, Хз) (7.3a) 


SSR(X3|X1, X2) = SSR(X1, Хә, Хз) — SSR(X1, Хә) (7.3b) 


260 PartTwo Multiple Linear Regression 


and: 

SSR(X2, X3|X1) = SSE(X1) — SSE(X,. Xo, Хз) (7.4a) 
or: 

SSR(X2, X3|X1) = SSR(Xi, X5, X3) — SSR(X1) (7.4b) 


Decomposition of SSR into Extra Sums of Squares 


In multiple regression, unlike simple linear regression, we can obtain a variety of decom- 
positions of the regression sum of squares SSR into extra sums of squares. Let us consider 
the case of two X variables. We begin with the identity (2.50) for variable Х|: 


SSTO = SSROG) + SSE(X ,) di (7.5) 


where the notation now shows explicitly that X, is the X variable in the model. Replacing 
SSE(X,) by its equivalent in (7.2a), we obtain: 


SSTO = SSR(X1) + SSR(X2[X1) + SSE(X,, Хә) (7.6) 


We now make use of the same identity for multiple regression with two X variables as 
in (7.5) for a single X variable, namely: 


SSTO = SSR(X,, X5) + SSE(X,, X2) (7.7) 
Solving (7.7) for SSE(X,, Хэ) and using this expression in (7.6) lead to: 
SSR(X,, Хэ) = SSR(X,) + SSR(X2]X1) (7.8) 


Thus, we have decomposed the regression sum of squares SSR(X |, Хз) into two marginal 
components: (1) SSR(X,), measuring the contribution by including X, alone in the model, 
and (2) SSR(X2|X1), measuring the additional contribution when X» is included, given that 
X, is already in the model. 

Of course, the order of the X variables is arbitrary. Here, we can also obtain the 
decomposition: 


SSR(X . Хэ) = SSR(X2) + SSR(X1|X2) (7.9) 


We show in Figure 7.1 schematic representations of the two decompositions of 
SSR(X,, Хз) for the body fat example. The total bar on the left represents SSTO and 
presents decomposition (7.9). The unshaded component of this bar is SSR(X2), and the 
combined shaded area represents 55Е( Хә). The latter area in turn is the combination of the 
extra sum of squares SSR(X;|X5) and the error sum of squares SSE(X,, X5) when both 
X, and X; are included in the model. Similarly, the bar on the right in Figure 7.1 shows 
decomposition (7.8). Note in both cases how the extra sum of squares can be viewed either 
as a reduction in the error sum of squares or as an increase in the regression sum of squares 
when the second predictor variable is added to the regression model. 

When the regression model contains three X variables, a variety of decompositions of 
SSR(X,, X2, Хз) can be obtained. We illustrate three of these: 


SSR(X1, X2, X4) = SSR(X1) + SSR(X2|X1) + SSR(X4|X,, X2) (7.104) 
SSR(Xi, Xo, X4) = SSR(X2) + SSR(X3|X2) + SSR(X,|Xo, X3) (7.100) 
SSR(X;, Xa. Хз) = SSR( X1) + SSR(X2, X3|X1) (7.100) 


Chapter 7 Multiple Regression П 261 


FIGURE 7.1 Schematic Representation of Extra Sums of Squares—Body Fat Example. 


SSR(X;) = 381.97 


SSR(Xq|X2) = 3.47 —f E 


SSEX;) = 113.42 


*, 


TABLE 7.3 
Example of 
ANOVA Table 
with 
Decomposition 
of SSR for 
Three X 
Variables. 


SSTO = 495.39 SSTO = 495.39 


SSR(X1) = 352.27 
SSR(X,, Xp) = 385.44 


| |< SSROGIX,) = 33.17 


SSE(X,) = 143.12 
$SE(X,, X5) = 109.95 


Source of 

Variation SS df MS 
Regression SSR(X4, X2, Хз) 3 MSR(X1, X2, X3) 
Xi SSR(X1) 1 MSR(X1) 

XX: SSR(X2| X1)... - 1 MSR(X2| X1) 
X3|X1, X2 SSR(X3| X1, X2) 1 MSR(X3| X1, X2) 
Error SSE(X+; Xa, Хз) n— MSE(X1, Хә, Хз) 
Total SSTO n-1 


It is obvious that the number of possible decompositions becomes vast as the number of 
X variables in the regression model increases. 


ANOVA Table Containing Decomposition of SSR 


ANOVA tables can be constructed containing decompositions of the regression sum of 
squares into extra sums of squares. Table 7.3 contains the ANOVA table decomposition 
for the case of three X variables often used in regression packages, and Table 7.4 contains 
this same decomposition for the body fat example. The decomposition involves single extra 
X variables. К 

Note that each extra sum of squares involving a single extra X variable has associated 
with it one degree of freedom. The resulting mean squares are constructed as usual. For 
example, MSR(X5|X|) in Table 7.3 is obtained as follows: 


SSR(X2|X1) 
1 


Extra sums of squares involving two extra X variables, such as SSR(X2, Хз| Х|), have 
two degrees of freedom associated with them. This follows because we can express such 
an extra sum of squares as a sum of two extra sums of squares, each associated with one 


MSR(Xz|X1) = 


262 Part Two Multiple Linear Regression 


TABLE 7.4 
ANOVA Table 
with 
Decomposition 
of SSR—Body 
Fat Example 
with Three 
Predictor 
Variables. 


Source of 

Variation SS df MS 
Regression 396.98 3 132.33 
X 352.27 1 352.27 
X2| X3 33.17 1 33.17 
X3|X1, X2 11.54 1 11.54 
Error 98.41 16 6.15 
Total 495.39 19 


degree of freedom. For example, by definition of the extra sums of squares, we have: 
SSR(X2, X3|X1) = SSR(X2|X 1) + SSRCX4| X, X2) (7.11) 
The mean square MSR(X», Хз|Х |) is therefore obtained as follows: 


SSR(X2, X3|X) 


MSR(X», X3|X1) = 2 


Many computer regression packages provide decompositions of SSR into single-degree- 
of-freedom extra sums of squares, usually in the order in which the X variables are entered 
into the model. Thus, if the X variables are entered in the order Х|, X5, Хз, the extra sums 
of squares given in the output are: 


SSR(X,) 
SSR(X2|X1) 
SSR(X3|X1, X2) 


If an extra sum of squares involving several extra X variables is desired, it can be obtained 
by summing appropriate single-degree-of-freedom extra sums of squares. For instance, to 
obtain SSR(X», X3|X1) in our earlier illustration, we would utilize (7.11) and simply add 
SSR(X2|X,) and SSR(X3|X1, X»). 

If the extra sum of squares SSR(X;, X3|X2) were desired with a computer package 
that provides single-degree-of-freedom extra sums of squares in the order in which the X 
variables are entered, the X variables would need to be entered in the order X2, X1, Хз or 
X2, Хз, X,. The first ordering would give: 


SSR(X2) 
SSR(X;|X2) 
SSR(X3|X1, Хә) 
The sum of the last two extra sums of squares will yield SSR(X,, X3| X5). 
The reason why extra sums of squares are of interest is that they occur in a variety 
of tests about regression coefficients where the question of concern is whether certain X 


variables can be dropped from the regression model. We turn next to this use of extra sums of 
squares. 


Chapter 7 Multiple Regression П 263 


72 Uses of Extra Sums of Squares in Tests 


for Regression Coefficients 


Test whether a Single В; = 0 
When we wish to test whether the term 6; Хк can be dropped from a multiple regression 
model, we are interested in the alternatives: 


Ho: Pk =0 
Ha: Pk * 0 
We already know that test statistic (6.51b): 


p by 
^ s{bx} 


* 


is appropriate for this test. 
Equivalently, we can use the general linear test approach described in Section 2.8. We 
z now show that this approach involves an extra sum of squares. Let us consider the first-order 
| regression model with three predictor variables: 


Y; = po + В, Xa + В.Х + Вз Хз +8; Full model (7.12) 
To test the alternatives: 
Но: Вз = 0 
b: Bs (7.13) 
Ha: Вз #0 


we fit the full model and obtain the error sum of squares SSE(F). We now explicitly show 
the variables in the full model, as follows: 


SSE(F) = SSE(X,, X2, Хз) 


The degrees of freedom associated with SSE(F) are dfr = n — 4 since there are four 
parameters in the regression function for the full model (7.12). 
The reduced model when Му in (7.13) holds is: 


Y; = Во + £i Xa + ВХ; + 8j Reduced model (7.14) 
We next fit this reduced model and obtain: 
SSE(R) = SSE(X,, X2) 


There are d fa = n — 3 degrees of freedom associated with the reduced model. 
The general linear test statistic (2.70): 
px __ SSEQQ- SSE) | SSECF) ° 
dfr — afr ` dfe 
here becomes: 
ре __ SSE, X2) — SSEQG, Xo, Хз) | SEQQ, Xo, Xs) 
(n —3) — (n — 4) : n—4 


64 PartTwo Multiple Linear Regression 


Example 


Note that the difference between the two error sums of squares in the numerator term is the 
extra sum of squares (7.3a): 


SSE(X1, X2) — SSE(X1, Хә, Хз) = SSR(X3|X1, X2) 


Hence the general linear test statistic here 15: 


SSR(Xi|Xi, X2) | SSE(X1, X2, Хз) _ MSR(X3|X1, X2) 
= 1 i n—4 ~ MSE(X,, X5, Ха) 


We thus see that the test whether or not Вз = O is a marginal test, given that X, and X, 
are already in the model. We also note that the extra sum of squares SSR(X3|X,, X2) has 
one degree of freedom associated with it, just as we noted earlier. „> 

Test statistic (7.15) shows that we do not need to fit both the full model and the reduced 
model to use the general linear test approach here. A single computer run can provide a fit 
of the full model and the appropriate extra sum of squares. 


F* 


(7.15) 


In the body fat example, we wish to test for the model with all three predictor variables 
whether midarm circumference (X3) can be dropped from the model. The test alternatives 
are those of (7.13). Table 7.4 contains the ANOVA results from a computer fit of the full 
regression model (7.12), including the extra sums of squares when the predictor variables 
are entered in the order Х|, X5, Хз. Hence, test statistic (7.15) here is: 
_ SSR(XsIX 1, Хо) | SSE(X1, X2, Хз) 
B l ` n—4 

11.54 98.41 

—— + — = 1.88 

1 16 

For a = .01, we require F(.99; 1, 16) = 8.53. Since F* = 1.88 < 8.53, we conclude Hp, 
that Хз can be dropped from the regression model that already contains X, and X3. 

Note from Table 7.2d that the г* test statistic here is: 

" з — —2.186 
7 s(b) 1.96 — 


Since (t*)? = (—1.37? = 1.88 = F*, we see that the two test statistics are equivalent, just 
as for simple linear regression. 


F* 


1.37 


Comment 

The F* test statistic (7.15) to test whether or not 5 = 0 is called a partial F test statistic to distinguish 
itfrom the F* statistic in (6.39b) for testing whether all B, = 0, ì.e., whether or not there is a regression 
relation between Y and the set of X variables. The latter test is called the overall F test. a 


Test whether Several fj, = 0 


In multiple regression we are frequently interested in whether several terms in the regression 
model can be dropped. For example, we may wish to know whether both f; X2? and £343 
can be dropped from the full model (7.12). The alternatives here are: 
Но: В = Вз = 0 
BESI (7.16) 
Не: not both £2 and Вз equal zero 


Example 


Chapter 7 Multiple Regression П 265 


With the general linear test approach, the reduced model under Hp is: 
Y; = Во + Bi Xn + & Reduced model (7.17) 
and the error sum of squares for the reduced model is: 
SSE(R) = SSE(X) 
This error sum of squares has dfg = n — 2 degrees of freedom associated with it. 
The general linear test statistic (2.70) thus becomes here: 


re SSE(Xi) — SSE(X,, X2, Хз) | SSE(X), X2, Хз) 

= (n — 2) — (n — 4) ] n—A 
Again the difference between the two error sums of squares in the numerator term is an 
extra sum of squares, namely: 


SSE(X,) — SSE(X,, Хә, Хз) = SSR(X2, Хз| X1) : 


Hence, the test statistic becomes: 
| SSR(Xo, X3|X1) _ SSE(Xi, Xo, Хз) _ MSR(X5, X3|X1) 
7 2 i n—4 — MSE(Xi, X2, Хз) 


Note that SSR(X2, X3|X,) has two degrees of freedom associated with it, as we pointed out 
earlier. 


F* 


(7.18) 


We wish to test in the body fat example for the model with all three predictor variables 
whether both thigh circumference (X5) and midarm circumference (X3) can be dropped 
from the full regression model (7.12). The alternatives are those in (7.16). The appropriate 
extra sum of squares can be obtained from Table 7.4, using (7.11): 
SSR(X», X3|X1) = SSR(X2| X1) + SSR(X3|X1, X2) 
= 33.17 + 11.54 = 44.71 

Test statistic (7.18) therefore is: 

SSR(X2, X3|X1) 


2 
= = + 6.15 = 3.63 


Е* + MSE(X,, X», Хз) 


For œ = .05, we require F(.95; 2, 16) =3.63. Since F* — 3.63 is at the boundary of the 
decision rule (the P-value of the test statistic is .05), we may wish to make further analyses 
before deciding whether X? and X4 should be dropped from the regression model that 
already contains Х|. X 


Comments 

1. For testing whether a single В, equals zero, two equivalent test statistics are available: the t* 
test statistic and the F* general linear test statistic. When testing whether several fj equal zero, only 
the general linear test statistic F* is available. 

2. General linear test statistic (2.70) for testing whether several X variables can be dropped 
from the general linear regression model (6.7) can be expressed in terms of the coefficients of 


266 PartTwo Multiple Linear Regression 
multiple determination for the full and reduced models. Denoting these by R7. and RÀ, respectively. 
we have: 


7 аў аўн dfr 


(7.19) 


Specifically for testing the alternatives in (7.16) for the body fat example, test statistic (7.19) becomes: 


Rua 4 Ry) t= Кэз 


ауе ллу” n—4 (7.20) 


where Ris denotes the coefficient of niultiple determination when Y is regféssed on X,, Xo. and 
X5, and Ron denotes the coefficient when Y is regressed on X, alone. 

Wesee from Table 7.4 that Ау 123 = 396.98/495.39 = .80135 апа Ry, —352.27/495.39 = 71110. 
Hence, we obtain by substituting in (7.20): 


w 


-80135 — 71110 _ 1—.80135 


"= 20-3 0-4} cae ас 


This is the same result as before. Note that Rj, corresponds to the coefficient of simple determination 
R? between Y and Х|. 

Test statistic (7.19) is not appropriate when the full and reduced regression models do not contain 
the intercept term Во. In that case, the general linear test statistic in the form (2.70) must be used. B 


7.3 Summary of Tests Concerning Regression Coefficients 


We have already discussed how to conduct several types of tests concerning regression 
coefficients in a multiple regression model. For completeness, we summarize here these 
tests as well as some additional types of tests. 


Test whether All В; = 0 


This is the overall F test (6.39) of whether or not there is a regression relation between the 
response variable Y and the set of X variables. The alternatives are: 


H a = m en km p-] = 0 

o: В. = B Вь-1 (7.21) 
Ha: not all By (k = 1,..., p — I) equal zero 

and the test statistic is: 
ре __ ЗКО... Хел). SSECKiy + Хь) 
pe | n-p 
_ MSR (7.22 
MSE 


If Ho holds, F* ~ F(p — 1,n — p). Large values of F* lead to conclusion Ha. 


Chapter 7 Multiple Regression] 267 


Test whether a Single f, = 0 
This is a partial F test of whether a particular regression coefficient В; equals zero. The 
alternatives are: 


Ho: By = 0 
Бу (7.23) 

Ha: By #0 

and the test statistic is: 
p _ SSRA Xi, -++ Хез, Xi ss Xo) | 5$Е(Х\,..., Хр) 
1 ` п—р 
О Xie ---, Хк, pae Xp- Í 
s (ХИХ, 1, Xk+1 р-1) (7.24) 
MSE 


If Ho holds, F* ~ F(1, n — p). Large values of F* lead to conclusion H, ÁStatistics 
packages that provide extra sums of squares permit use of this test without having to fit the 
reduced model. 

An equivalent test statistic is (6.51b): 


~_ b 
1" = ——— 
sibi) 
If Ho holds, г* ~ t (n — p). Large values of |t*| lead to conclusion H4. 


Since the two tests are equivalent, the choice is usually made in terms of available 
information provided by the regression package output. 


(7.25) 


Test whether Some В; = 0 
This 15 another partial F test. Here, the alternatives are: 
Но: В, = Bo+1 mp =0 
Ha: not all of the В; in Ho equal zero 


(7.26) 


where for convenience, we arrange the model so that the last p — q coefficients are the ones 
to be tested. The test statistic 15: 


_ SSRXs, ---» Xp id Xr, Хал) . SSEQG, ---» Xp-1) 
p—q | n-p 

|, MSR(X,, ..., Xy iX... Хаа) 

С MSE 


If Ho holds, F* ~ F(p —q,n— p). Large values of F* lead to conclusion H,. 

Note that test statistic (7.27) actually encompasses the two earlier cases. If q = 1, the 
test 15 whether all regression coefficients equal zero. If = p — 1, the test is whether a 
single regression coefficient equals zero. Also note that test statistic (7.27) can be calculated 
without having to fit the reduced model if the regression package provides the needed extra 
sums of squares: 


F* 


(7.27) 


» 


2 SSROE cene eX ale Ху) 
y = SSRX Д Xo) + PSR DX pc, Xp-2) (7.28) 


68 Part Two Multiple Linear Regression 


Other Tests 


Test statistic (7.27) can be stated equivalently in terms of the coefficients of multiple 
determination for the full and reduced models when these models contain the intercept term 
Bo. as follows: 


2 2 2 
Ry pi БЕ Ryi-q-1 ES l= Ry. 


F* = : 7.29 

p-—dq n—p ( ) 

where Rj,,..,. , denotes the coefficient of multiple determination when Y is regressed on 
all X variables, and Ryu denotes the coefficient when Y is regressed on Xi, ..., X, , 


only. 


> 


When tests about regression coefficients are desired that do not involve testing whether one 
or several В; equal zero, extra sums of squares cannot be used and the general linear test 
approach requires separate fittings of the full and reduced models. For instance, for the full 
model containing three X variables: 


Y; = Bot В. Ха + В Хо + ВзХ;з + є; Full model (7.30) 
we might wish to test: 


Ho: B1 = В› 


(7.31) 
Ha: Ву # Bo 
The procedure would be to fit the full model (7.30), and then the reduced model: 
Y; = Bo + СХ + Хо) + B3Xis +e; Reduced model (7.32) 


where В. denotes the common coefficient for B; and B; under Ho and X; + Xj; is the 
corresponding new X variable. We then use the general F* test statistic (2.70) with 1 and 
n — 4 degrees of freedom. 

Another example where extra sums of squares cannot be used is in the following test for 
regression model (7.30): 


Ho: Ву = 3, Вз = 5 


Је (7.33) 
На: not both equalities in Но hold 
Here, the reduced model would be: 
Y; — 3Xi, — 5Xi3 = Pot ВХ + &i Reduced model (7.34) 


Note the new response variable Y — 3X, — 5X3 in the reduced model, since В, Х| and f3X; 
are known constants under Ho. We then use the general linear test statistic F* in (2.70) vith 
2 and п — 4 degrees of freedom. 


7.4 Coefficients of Partial Determination 


Extra sums of squares are not only useful for tests on the regression coefficients of a multiple 
regression model, but they are also encountered in descriptive measures of relationship called 
coefficients of partial determination. Recall that the coefficient of multiple determination, 
R?, measures the proportionate reduction in the variation of Y achieved by the introduction 


Chapter 7 Multiple Regression П 269 


of the entire set of X variables considered in the model. A coefficient of partial determination, 
in contrast, measures the marginal contribution of one X variable when all others are already 
included in the model. 


Two Predictor Variables 


We first consider a first-order multiple regression model with two predictor variables, as 
given in (6.1): 


Y; = Bo + f1Xn + BoXin + е 


SSE(X2) measures the variation in Y when X» is included in the model. SSE(X,, X2) 
measures the variation in Y when both X; and X, are included in the model. Hence, the 
relative marginal reduction in the variation in Y associated with Xì when X» is already in 
the model is: i 


SSE(X2) — SSE(X1, X2) _ SSR(X3| X2) 


SSE(X;) SSE(X;) 


This measure is the coefficient of partial determination between Y and Х|, given that X» is 
in the model. We denote this measure by RẸ: 


_ SSE(X;) — SSE(X1, X5).  SSR(X1|X2) 


SSE(X2) © SSE(X2) (7-35) 


Thus, А2 о Measures the proportionate reduction in the variation іп Y remaining after X; 
is included in the model that is gained by also including X, in the model. 
The coefficient of partial determination between Y and X», given that Х is in the model, 
is defined correspondingly: 
ара 


Кү = TSEX) (7.36) 


"General Case 
The generalization of coefficients of partial determination to three or more X variables in 
the model is immediate. For instance: 


Уз ж ыры (7.37) 
bc ао (7.38) 
1 2m = еи (7.39) 
Rau “сур у (7.40) 


Note that in the subscripts to R?, the entries to the left of the vertical bar show in turn 
5.7 the variable taken as the response and the X variable being added. The entries to the right 
3 Of the vertical bar show the X variables already in the model. 


270 PartTwo Multiple Linear Regression 


Example 


Coefficients 


For the body fat example, we can obtain a variety of coefficients of partial determination, 
Here are three (Tables 7.2 and 7.4): 


‚ _ SSROGIX)) _ 33.17 
yat = C SSEX) 14312 
) _ SSROGDG. X) _ 1154 _ 5 
тап? SSE(X,, X5) 109.95 ` 
> SSR(Xi|X2) 347 
пе  SSEQXG) — 01342 


.232 


= .031 


We see that when X» is added to the regression model containing X, here, the error sum 
of squares SSE(X1) is reduced by 23.2 percent. The error sum of squares for the model 
containing both X, and X» is only reduced by another 10.5 percent when Хз is added to the 
model. Finally, if the regression model already contains X», adding X, reduces SSE(X;) 
by only 3.1 percent. 


Comments 

1. The coefficients of partial determination can take on values between 0 and 1, as the definitions 
readily indicate. 

2. Acoefficient of partial determination can be interpreted as a coefficient of simple determination. 
Consider a multiple regression model with two X variables. Suppose we regress Y on X» and obtain 
the residuals: 


ej(Y1X2) = Y; — ¥;(X2) 


where Y i (X2) denotes the fitted values of Y when X» is in the model. Suppose we further regress X 
on X» and obtain the residuals: 


ei(X1X3) = Ха — Ха(Х) 


where X, (X5) denotes the fitted values of X, in the regression of X, on X2. The coefficient of simple 
determination R`% between these two sets of residuals equals the coefficiem of partial determination 
Ку о: Thus, this coefficient measures the relation between Y and X, when both of these variables 
have been adjusted for their linear relationships to X». 

3. The plot of the residuals e; (Y | X») against e; (X,| X5) provides a graphical representation of the 
strength of the relationship between Y and Х,, adjusted for Хэ. Such plots of residuals, called added 
variable plots or partial regression plots, are discussed in Section 10.1. a 


of Partial Correlation 

The square root of a coefficient of partial determination is called a coefficient of partial 
correlation. Itis given the same sign as that of the corresponding regression coefficientinthe 
fitted regression function. Coefficients of partial correlation are frequently used in practice, 
although they do not have as clear a meaning as coefficients of partial determination. Ore 
use of partial correlation coefficients is in computer routines for finding the best predictot 
variable to be selected next for inclusion in the regression model. We discuss this use M 
Chapter 9. 


ТеСЕТ 
Example 
Example — 


Chapter 7 Multiple Regression] 271 


For the body fat example, we have: 
түз = 4/232 = 482 
ғүзрә = —М/.105 = —.324 
гүџо = v .081 = .176 


Note that the coefficients ry; and ғуџ are positive because we see from Table 7.2c that 
b, = .6594 and b, = .2224 are positive. Similarly, ry3j,2 is negative because we see from 
Table 7.2d that b4 — —2.186 is negative. » 

Comment 

Coefficients of partial determination can be expressed in terms of simple or other partial correlation 
coefficients. For example: L 


ry2 — rpry)? 
Hy Sieh en пл (7.41) 


Б (1 = riz) (1 =) 
(ғудз — зҮ цз)” 
з = =a деке 
vais = Urvausl (1 = ri) (1 — rfis) 


where ry, denotes the coefficient of simple correlation between Y and Xj, ri? denotes the coefficient 
of simple correlation between X, and X5, and so on. Extensions are straightforward. ш 


(7.42) 


75 Standardized Multiple Regression Model 


A standardized form of the general multiple regression model (6.7) is employed to control 
roundoff errors in normal equations calculations and to permit comparisons of the estimated 
regression coefficients in common units. 


Roundoff Errors in Normal Equations Calculations 


The results from normal equations calculations can be sensitive to rounding of data in 
intermediate stages of calculations. When the number of X variables is small—say, three 
or less—roundoff effects can be controlled by carrying a sufficient number of digits in 
intermediate calculations. Indeed, most computer regression programs use double-precision 
arithmetic in all computations to control roundoff effects. Still, with a large number of 
X variables, serious roundoff effects can arise despite the use of many digits in intermediate 
calculations. 

Roundoff errors tend to enter normal equations calculations primarily when the inverse 
of X'X is taken. Of course, any errors in (X’X)~! may be magnified in calculating b and 
other subsequent statistics. The danger of serious roundoff errors in (X’X)~? is particularly 
great when (1) X'X has a determinant that is close to zero and/or (2) the elements of X'X 
differ substantially in order of magnitude. The first condition arises when some or all of the 
X variables are highly intercorrelated. We shall discuss this situation in Section 7.6. 

The second condition arises when the X variables have substantially different magnitudes 
So that the entries in the X’X matrix cover a wide range, say, from 15 to 49,000,000. A 
solution for this condition is to transform the variables and thereby reparameterize the 
regression model into the standardized regression model. 


272 PartTwo Multiple Linear Regression 


The transformation to obtain the standardized regression model, called the correlation 
transformation, makes all entries in the ХХ matrix for the transformed variables fall between 
— | and | inclusive, so that the calculation of the inverse matrix becomes much less Subject 
to roundoff errors due to dissimilar orders of magnitudes than with the original variables. 


Comment 

In order to avoid the computational difficulties inherent in inverting the XX matrix, many statistica 
packages use an entirely different computational approach that involves decomposing the X matrix into 
a product of several matrices with special properties. The X matrix is often first modified by centering 
each of the variables (i.e., using the deviations around the mean) to further intprove computational 
accuracy. Information on decomposition strategies may be found in texts on statistical G-mputing, 
such as Reference 7.1. N 


Lack of Comparability in Regression Coefficients 
A second difficulty with the nonstandardized multiple regression model (6.7) is that ordinar. 
ily regression coefficients cannot be compared because of differences in the units involved, 
We cite two examples. 


1. When considering the fitted response function: 
Ӯ = 200 + 20,000X, + .2X; 


one may be tempted to conclude that X, is the only important predictor variable, and that 
X» has little effect on the response variable Y. A little reflection should make one wary of 
this conclusion. The reason is that we do not know the units involved. Suppose the units are: 


Y in dollars 
X, in thousand dollars 
X» in cents 


In that event, the effect on the mean response of a $1,000 increase in X, (i.e., a l-unit 
increase) when X; is constant would be an increase of $20,000. This is exactly the same 
as the effect of a $1,000 increase in X» (i.e., a 100,000-unit increase) when X; is constant, 
despite the difference in the regression coefficients. 

2. In the Dwaine Studios example of Figure 6.5, we cannot make any comparison be- 
tween b, and b; because X, is in units of thousand persons aged 16 or younger, whereas 
X» is in units of thousand dollars of per capita disposable income. 


Correlation Transformation 

Use of the correlation transformation helps with controlling roundoff errors and, by express 
ing the regression coefficients in the same units, may be of help when these coefficients 
are compared. We shall first describe the correlation transformation and then the resulting 
standardized regression model. . 

The correlation transformation is a simple modification of the usual standardization of a 
variable. Standardizing a variable, as in (A.37), involves centering and scaling the variable, 
Centering involves taking the difference between each observation and the mean of all 
observations for the variable; scaling involves expressing the centered observations in unis 
of the standard deviation of the observations for the variable. Thus, the usual standardization. 


Chapter 7 Multiple Regression П 273 


of the response variable Y and the predictor variables X4, ..., Xp- are as follows: 
Y;— ? 
——— (7.432) 
Sy 
Xn = Х 
SECO disque pe) (7.43b) 
Sk 


where Y and X, are the respective means of the Y and the X; observations, and sy and 5 
are the respective standard deviations defined as follows: 


(7.43c) 


Sy — 


(k=1,...,p—1) (7.43d) 


The correlation transformation is a simple function of the standardized variables in 
(7.43a, b): 


y = (=) (7.44a) 


Sy 


1 X — X 
Xi = == (k—1,....p— 1) (7.44b) 
k 
Standardized Regression Model 
The regression model with the transformed variables Y* and X7 as defined by the correlation 
transformation in (7.44) is called a standardized regression model and is as follows: 
Yi = ВХ +++ Вр Хора +87 (7.45) 


5p— 


The reason why there is no intercept parameter in the standardized regression model (7.45) is 
that the least squares or maximum likelihood calculations always would lead to an estimated 
intercept term of zero if an intercept parameter were present in the model. 

It is easy to show that the parameters £r, ..., B7 , in the standardized regression model 
and the original parameters Во, £i, ..., Bp—1 in the ordinary multiple regression model (6.7) 
are related as follows: 


&- (2) (k-1...,p- 1) (7.46a) 


bo —Y—BA8Xi----—BpaXpa (7.46b) 


We see that the standardized regression coefficients В? and the original regression coeffi- 
cients В; (Е = 1, ..., p— 1) are related by simple scaling factors involving ratios of standard 
deviations. 


274 Рай Two Multiple Linear Regression 


X'X Matrix for Transformed Variables 
In order to be able to study the special nature of the X'X matrix and the least squares normal 
equations when the variables have been transformed by the correlation transformation, We 
need to decompose the correlation matrix in (6.67) containing all pairwise correlation coef: 
ficients among the respouse and predictor variables У, Xi, Хэ,..., Xp—1 into two matrices. 


і. The first matrix, denoted by ry x, is called the correlation matrix of the X variables, Y 
has as its elements the coefficients of simple correlation between all pairs ofthe X узе, + 
This matrix is defined as follows: ; 

І кү? vc Faye 
ra l t 9р1 E 
гхх = 7 АШ (7.47) 


(р—1)х(р—1) 


Fp—1.1 Vp-1.2 "7 | 


Here, гү) again denotes the coefficient of simple correlation between X, and X», and so d 
on. Note that the main diagonal consists of 1s because the coefficient of simple correlation ғ 
between a variable and itself is 1. The correlation matrix rx x is symmetric; remember that * 
гиг = гек. Because of the symmetry of this matrix, computer printouts frequently omit the : 
lower or upper triangular block of elements. | 
2. The second matrix, denoted by ry, is a vector containing the coefficients of simple: 
correlation between the response variable Y and each of the X variables, denoted again by 


D 


Күү, l'y2, etc.: i 
ryt : 
Ру? E 
Yyx = (7.48) 
(р—1)х1 Б 
F'y.p-i 


Now we are ready to consider the X'X matrix for the transformed variables in the 


standardized regression model (7.45). The X matrix here is: › 
ХТ; ХҮ pa 
X3 Xo 
х ul = (749) 
пх(р—1) d Е 
Xn ия A t 


Remember that the standardized regression model (7.45) does not contain an intercept term, 
hence, there is no column of 1$ in the X matrix. It can be shown that the X'X matrix for the 
transformed variables is simply the correlation matrix of the X variables defined in (747) 


XX —rxx (7 ‚50 
(р—1)х(р—1) 

Since the X'X matrix for the transformed variables consists of coefficients of correlatio 

between the X variables, all of its elements are between —1 and 1 and thus are of th 

same order of magnitude. As we pointed out earlier, this can be of great help in controllin 


roundoff errors when inverting the X’K matrix. 


Chapter 7 Multiple Regression] 275 


Comment 


We illustrate that the X'X matrix for the transformed variables is the correlation matrix of the X 
variables by considering two entries in the matrix: 


1. In the upper left corner of X'X we have: 


a2 Xn — X | Xn — Fy? ^ 
y -> (AR) E n-i ` si = 


2. In the first row, second column of X'X, we have: 


vs п = Ху Xi; — X; 
2 Xi 5 ,- X( ad eum 


1 У(Ха— X1)(Xi2 — X2) 


^ n-1 5152 
p 0 D nn- X2) 
[3G = X? УХ — Xy] d 
But this equals ri, the coefficient of correlation between X, and X3, by (2.84). ш 


Estimated Standardized Regression Coefficients 


The least squares normal equations (6.24) for the ordinary multiple regression model: 
X'Xb = X'Y 
and the least squares estimators (6.25): 
b = (X'X)'!X'Y 
can be expressed simply for the transformed variables. It can be shown that for the trans- 
formed variables, X'Y becomes: 
XY =ryx (7.51) 


(р—1)х1 
where ryx is defined in (7.48) as the vector of the coefficients of simple correlation between 
Y and each X variable. It now follows from (7.50) and (7.51) that the least squares nor- 
mal equations and estimators of the regression coefficients of the standardized regression 
model (7.45) are as follows: 


rxxb = ryx (7.52a) 
b =гу\кух (7.52Ь) 
where: 
br А 
р? 
b =| ” (7.52c) 
(р—1)х1 : 
ЖЕ] 
“The regression coefficients b. bra are often called standardized regression 


coefficients. 


276 PartTwo Multiple Linear Regression 


Example 


The return to the estimated regression coefficients for regression model (6.7) in the 
original variables is accomplished by employing the relations: 


AY 
һ= (ut CR spe) (7.533) 
Sk 


с esr. (7.53b) 


Comment 


When there are two X variables in the regression model, i.e., when p — 1 = 2, we can readily see m. 
algebraic form of the standardized regression coefficients. We have: 


di ro 
rxx = lu 1 | | (7.542) 
_ |n А 
Гүх = И (7.540) 
1 ] -r 
"e 12 
Yyx TT 1—г% Es 1 | (7.54с) 
Hence, by (7.52b) we obtain: 
1 1 =re| [ғи 1 Күү — ial 2 
b=——> л ш. 
1—2, ВЯ 1 | ie 1—r?, [rra — rrr (299) 
Thus: 
be = 11-700 (7.55a) 
l-r 
= 12-70 (7.55b) 
1 — ғ 
[| 


Table 7.5a repeats a portion of the original data for the Dwaine Studios example in Fig- 
ure 6.5b, and Table 7.5b contains the data transformed according to the correlation trans- 
formation (7.44). We illustrate the calculation of the transformed data for the first case, 
using the means and standard deviations in Table 7.5a (differences in the last digit of the 
transformed data are due to rounding effects): 


p= 1 (A=) y = 1 ==) 
: Мп — 1 Sy H Мп — 1 Si 
1 (— a) 1 (= — = | 


"uos 36.191 14 18.620 
= —.04634 = .07783 
1 Xi — X5 1 16.7 — 17.143 
Х*, = = = —.10208 
Баат ( $2 ) 21-1 ( 97035 ) 


TABLE 7.5 
Correlation 
Transforma- 
tion and Fitted 
Standardized 
Regression 
Model— 
Dwaine Studios 
Example. 


Chapter 7 Multiple Regression П 277 


^(&)OriginalDáta — | 
: . Target Рег:Сарќа 
Саѕе Sales Population Disposable Income 
i Y, pO хр 
1 174.4 68.5 16.7 
2 164.4 45.3 16:8: 
20 224.1 827 19.1 
21 166.5 52:3 16.0 
y —181.90 X = 62:019 Xi 17443 
sy — 36.191 s/—18.620 s; = 197035 
e Transformed: Data 
4, ^ -—04637 07783 —.10205 
2 —.10815 —.20198 —.07901 
20 .26070 24835 45100 
21 —.09518 —.11671 —.26336 


(c) Fitted Standardized Model 
P= „7484Х y 2513 XE — 


When fitting the standardized regression model (7.45) to the transformed data, we obtain 
the fitted model in Table 7.5c: 


= .7484Хү + .2511X5 


The standardized regression coefficients Ру = .7484 and b} = .2511 are shown in the 
SYSTAT regression output in Figure 6.5a on page 237, labeled STD COEF. We see from 
the standardized regression coefficients that an increase of one standard deviation of X, 
(target population) when X» (per capita disposable income) is fixed leads to a much larger 
increase in expected sales (in units of standard deviations of Y) than does an increase of 
one standard deviation of X2 when X; is fixed. s 

To shift from the standardized regression coefficients bf апа 55 back to the regression 
coefficients for the model with the original variables, we employ (7.53). Using the data in 
Table 7.5, we obtain: 


36.191 
h= (2)и= 91 (7484) = 1.4546 


18.620 


Sy 36.191 
= (— | = 2511) = 9.3652 
( )н- “97035 © Ten 


= Ў — bX, — bX, = 181.90 — 1.4546(62.019) — 9.3652(17.143) = —68.860 


278 PartTwo Multiple Linear Regression 


The estimated regression function for the multiple regression model in the original variable, 
therefore is: 


f = —68.860 + 1.455X, + 9.365X> 


This is the same fitted regression function we obtained in Chapter 6, except for slight 
rounding effect differences. Here, b; and b; cannot be compared directly because X; is in 
units of thousands of persons and X; is in units of thousands of dollars. 

Sometimes the standardized regression coefficients bf = .7484 and Б? = .2511 are ip. 
terpreted as showing that target population (X1) has a much greater impact on sales than 
per capita disposable income (X2) because bj is much larger than b}. However, as Weewill 
see in the next section, one must be cautious about interpreting any regression coefficient, 
whether standardized or not. The reason is that when the predictor variables are correlated 
among themselves, as here, the regression coefficients are affected by the other predictor 
variables in the model. For the Dwaine Studios data, the correlation between X, and X, is 
ri? = .781, as shown in the correlation matrix in Figure 6.4b on page 232. 

The magnitudes of the standardized regression coefficients are affected not only by 
the presence of correlations among the predictor variables but also by the spacings of the 
observations on each of these variables. Sometimes these spacings may be quite arbitrary, 
Hence, it is ordinarily not wise to interpret the magnitudes of standardized regression 
coefficients as reflecting the comparative importance of the predictor variables. 


Comments 


1. Some computer packages present both the regression coefficients b, for the model in the original 
variables as well as the standardized coefficients bz, as in ће SYSTAT output in Figure 6.5a. The 
standardized coefficients are sometimes labeled beta coefficients in printouts. 

2. Some computer printouts show the magnitude of the determinant of the correlation matrix of 
the X variables. A near-zero value for this determinant implies both a high degree of linear association 
among the X variables and a high potential for roundoff errors. For two X variables, this determinant 
is seen from (7.54) to be 1 — ғ2,, which approaches 0 as гү, approaches 1. 

3. It is possible to use the correlation transformation with a computer package that does not 
permit regression through the origin, because the intercept coefficient b; will always be zero for data 
so transformed. The other regression coefficients will also be correct. 

4. Use of the standardized variables (7.43) without the correlation transformation modifica- 
tion in (7.44) will lead to the same standardized regression coefficients as those in (7.52b) for the 
correlation-transformed variables. However, the elements of the X'X matrix will not then be bounded 
between —1 and 1. L| 


7.6 Multicollinearity and Its Effects 


In multiple regression analysis, the nature and significance of the relations between the 
predictor or explanatory variables and the response variable are often of particular interest. 
Some questions frequently asked are: 


1. What is the relative importance of the effects of the different predictor variables? 

2. Whatisthe magnitude of the effect of a given predictor variable on the response variable? 

3. Can any predictor variable be dropped from the model because it has little or no effect 
on the response variable? 


Chapter 7 Multiple Regression] 279 


4. Should any predictor variables not yet included in the model be considered for possible 
inclusion? 


If the predictor variables included in the model are (1) uncorrelated among themselves 
and (2) uncorrelated with any other predictor variables that are related to the response 
variable but are omitted from the model, relatively simple answers can be given to these 
questions. Unfortunately, in many nonexperimental situations in business, economics, and 
the social and biological sciences, the predictor or explanatory variables tend to be correlated 
among themselves and with other variables that are related to the response variable but are 
not included in the model. For example, in a regression of family food expenditures on 
the explanatory variables family income, family savings, and age of head of household, 
the explanatory variables will be correlated among themselves. Further, they will also be 
correlated with other socioeconomic variables not included in the model that do affect 
family food expenditures, such as family size. 

When the predictor variables are correlated among themselves, intercorrelation or multi- 
collinearity among them is said to exist. (Sometimes the latter term is reserved for those 
instances when the correlation among the predictor variables is very high.) We shall explore 
a variety of interrelated problems created by multicollinearity among the predictor variables. 
First, however, we examine the situation when the predictor variables are not correlated. 


Uncorrelated Predictor Variables 


TABLE 7.6 
Uncorrelated 
Predictor 
Variables— 
Work Crew 
Productivity 
Example. 


Table 7.6 contains data for a small-scale experiment on the effect of work crew size (X1) 
and level of bonus pay (X25) on crew productivity (У). The predictor variables X, and X? are 
uncorrelated here, i.e., r2, = 0, where r?, denotes the coefficient of simple determination 
between X, and X». Table 7.7a contains the fitted regression function and the analysis of 
variance table when both X; and X5 are included in the model. Table 7.7b contains the same 
information when only X, is included in the model, and Table 7.7c contains this information 
when only X; is in the model. 

An important feature to note in Table 7.7 is that the regression coefficient for Х|, bj = 
5.375, is the same whether only X; is included in the model or both predictor variables are 
included. The same holds for b; = 9.250. This is the result of the two predictor variables 
being uncorrelated. 


Bonus Pay 
Case Crew Size (dollars) ' Crew Productivity 
i Xn Xin Y; 


ONAUAWN= 
(O0 0 0 b AD 
о ш № кә о) ч) № № 

s 

tA 

—_ 


280 Part Two 


TABLE 7.7 
Regression 
Results when 
Predictor 
Variables Are 
Uncorrelated— 
Work Crew 
Productivity 
Example. 


Multiple Linear Regression 


(a) Regression of Y on X, and X; 
y = .375 + 5.375 X1 + 9.250 X2 


Source of 
Variation SS df MS 
Regression 402.250 2 201.125 
Error 17.625 5 3.525 
Total 419.875 7 
(b) Regression of Y on Ху 
Y = 23.500 + 5.375 X 
are 
Source of id 
Variation $$ df MS 
Regression 231.125 1 231.125 
Error 188.750 6 31.458 
Total 419.875 7* 
(с) Regression of Y on X; 
y = 27.250 + 9.250X; 
Source of 
Variation SS df MS 
Regression 171.125 1 171.125 
Error 248.750 6 41.458 
Total 419.875 7 


Thus, when the predictor variables are uncorrelated, the effects ascribed to them by a 
first-order regression model are the same no matter which other of these predictor variables 
are included in the model. This is a strong argument for controlled experiments whenever 
possible, since experimental control permits choosing the levels of the predictor variables 
so as to make these variables uncorrelated. 

Another important feature of Table 7.7 is related to the error sums of squares. Note from 
Table 7.7 that the extra sum of squares 55К(Х || Хә) equals the regression sum of squares 
SSR(X,) when only Х| is in the regression model: 


SSR(X,|X2) = SSE(X5) — SSE(X,, X2) 
= 248.750 — 17.625 = 231.125 
SSR(X,) = 231.125 
Similarly, the extra sum of squares SSR(X2|X1) equals SSR(X5), the regression sum of 
squares when only X; is in the regression model: 
SSR(X3|X1) = SSE(X1) — SSE(X,, X2) 
— 188.750 — 17.625 — 171.125 
SSR(X5) = 171.125 


Chapter 7 Multiple Regression] 281 


In general, when two or more predictor variables are uncorrelated, the marginal contribu- 
tion of one predictor variable in reducing the error sum of squares when the other predictor 
variables are in the model is exactly the same as when this predictor variable is in the model 
alone. 


Comment 

To show that the regression coefficient of X, is unchanged when X; is added to the regression model 
in the case where X, and X» are uncorrelated, consider the following algebraic expression for b, in 
the first-order multiple regression model with two predictor variables: 


Уха —X,)(; - Y) | УО, EX yy [К na às 
ру SS 
Уа - X) un eX as 


b, = 
i 1-rZ i 


where, as before, гуз denotes the coefficient of simple correlation between Y and X5, and rj? denotes 
the coefficient of simple correlation between X, and X». 

If X, and X are uncorrelated, rj? = О, апа (7.56) reduces to: 
Е _ ka — X005 —Ё) 


b — 
: У (Ха — Xy? 


when Рә = 0 (7.56a) 


But (7.56a) is the estimator of the slope for the simple linear regression of Y on X, per (1.10a). 
Hence, when X, and X are uncorrelated, adding X» to the regression model does not change the 

regression coefficient for Х|; correspondingly, adding X, to the regression model does not change 

the regression coefficient for X». ш 


Nature of Problem when Predictor Variables Are Perfectly Correlated 


TABLE 7.8 
Example of 
Perfectly 
Correlated 
Predictor 
Variables. 


To see the essential nature of the problem of multicollinearity, we shall employ a simple 
example where the two predictor variables are perfectly correlated. The data in Table 7.8 
refer to four sample observations on a response variable and two predictor variables. Mr. A 
was asked to fit the first-order multiple regression function: 


E{Y} = fo + В.Х, + 2X2 (7.57) 

Fitted Values for 

| Regression Function 
Case „азы Шы ш кы ш RE 

i Xn Xiz * (7:58) (7.59) 
Д 2 6 23. E 
2 8 9 83 * 83 
3 6 8 63 „263: 
4 10 10° 103 103- 


$ 


.. Responsé'unctions: 
P--7Z.9X/x2X; (7:59) 


282 PartTwo Multiple Linear Regression 


FIGURE 7.2 
Two Response 
Planes That 
Intersect when 


Х =5+ 5X1. 


He returned in a short time with the fitted response function: 
Ê = —87 + X,+18X2 (7.58) 


He was proud because the response function fits the data perfectly. The fitted values are 
shown in Table 7.8. 

It so happened that Ms. B also was asked to fit the response function (7.57) to the same 
data, and she proudly obtained: 


f =-7+4+9X, +2X2 (7.59) 


Her response function also fits the data perfectly, as shown in Table 7.8. 

Indeed, it can be shown that infinitely many response functions will fit the data in 
Table 7.8 perfectly. The reason is that the predictor variables Х and X» are perfectly 
related, according to the relation: 


Х» =5+.5Х\ (7.60) 


Note that the fitted response functions (7.58) and (7.59) are entirely different response 
surfaces, as may be seen in Figure 7.2. The two response surfaces have the same fitted 
values only when they intersect. This occurs when X, and X» follow relation (7.60), i.e., 
when X2 =5+.5X). 

Thus, when X, and X; are perfectly related and, as in our example, the data do not 
contain any random error component, many different response functions will lead to the 
same perfectly fitted values for the observations and to the same fitted values for any 
other (X1, Хз) combinations following the relation between X, and X2. Yet these response 
functions are not the same and will lead to different fitted values for (Х,, X2) combinations 
that do not follow the relation between X, and X2. 

Two key implications of this example are: 


1. The perfect relation between X, and X; did not inhibit our ability to obtain a good fil 
to the data. 


Chapter 7 Multiple Regression П 283 


2. Since many different response functions provide the same good fit, we cannot interpret 
any one set of regression coefficients as reflecting the effects of the different predictor 
variables. Thus, in response function (7.58), bj = 1 and b; = 18 do not imply that X? is the 
key predictor variable and X plays little role, because response function (7.59) provides 
an equally good fit and its regression coefficients have opposite comparative magnitudes. 


Effects of Multicollinearity 
In practice, we seldom find predictor variables that are perfectly related or data that do not 
contain some random error component. Nevertheless, the implications just noeg for our 
idealized example still have relevance. 


1. The fact that some or all predictor variables are correlated among themselyes does 
not, in general, inhibit our ability to obtain a good fit nor does it tend to affect inferences і 
about mean responses or predictions of new observations, provided these inferences are 
made within the region of observations. (Figure 6.3 on p. 231 illustrates the concept of the 
region of observations for the case of two predictor variables.) 

2. Thecounterpartin real life to the many different regression functions providing equally 
good fits to the data in our idealized example is that the estimated regression coefficients tend 
to have large sampling variability when the predictor variables are highly correlated. Thus, 
the estimated regression coefficients tend to vary widely from one sample to the next when 
the predictor variables are highly correlated. As a result, only imprecise information may 
be available about the individual true regression coefficients. Indeed, many of the estimated 
regression coefficients individually may be statistically not significant even though a definite 
statistical relation exists between the response variable and the set of predictor variables. 

3. The common interpretation of a regression coefficient as measuring the change in the 
expected value of the response variable when the given predictor variable is increased by 
one unit while all other predictor variables are held constant is not fully applicable when 
multicollinearity exists. It may be conceptually feasible to think of varying one predictor 
variable and holding the others constant, but it may not be possible in practice to do so 
for predictor variables that are highly correlated. For example, in a regression model for 
predicting crop yield from amount of rainfall and hours of sunshine, the relation between the 
two predictor variables makes it unrealistic to consider varying one while holding the other 
constant. Therefore, the simple interpretation of the regression coefficients as measuring 
marginal effects is often unwarranted with highly correlated predictor variables. 


We illustrate these effects of multicollinearity:by returning to the body fat example. A 
portion of the basic data was given in Table 7.1, and regression results for different fitted 
models were presented in Table 7.2. Figure 7.3 contains the scatter plot matrix and the 
correlation matrix of the predictor variables. It is evident from the scatter plot matrix that 
predictor variables X; and X» are highly correlated; the correlation matrix of the X variables 
shows that the coefficient of simple correlation is rj? — .924. On the other hand, X4 is not so 
highly related to X, and X» individually; the correlation matrix shows that the correlation 
coefficients are гіз = .458 and r23 = .085. (But Хз is highly correlated with X, and X2 
together; the coefficient of multiple determination when X3 is regressed on X, and X» 
is .998.) 


284 PartTwo Multiple Linear Regression 


FIGURE 7.3 (а) Scatter Plot Matrix of X Variables (b) Correlation Matrix of X Variables 

Scatter Plot e S] o ae o 

Matrix and ее 

Correlation xi E NC 

Matrix of the S disi 

Predictor MB T 

Variables— Ev Qe e 1.0 924 .458 

Body Fat RE x2 ea% rxx= | .924 1.0 .085 

Example. } A | M .458 .085 1.0 
tamm wp o» 


Effects on Regression Coefficients. Note from Table 7.2 that the regression coefficient 
for Х|, triceps skinfold thickness, varies markedly depending on which other variables are 
included in the model: 


Variables in Model bi b2 
X .8572 — 
X2 — .8565 
Xi, Хә .2224 .6594 
Xi, Хә, Хз 4.334 —2.857 


The story is the same for the regression coefficient for X5. Indeed, the regression co- 
efficient b; even changes sign when X; is added to the model that includes X, and Хэ. 

The important conclusion we must draw is: When predictor variables are correlated, the 
regression coefficient of any one variable depends on which other predictor variables are 
included in the model and which ones are left out. Thus, a regression coefficient does not 
reflect any inherent effect of the particular predictor variable on the response variable but 
only a marginal or partial effect, given whatever other correlated predictor variables are 
included in the model. 


Comment 


Another illustration of how intercorrelated predictor variables that are omitted from the regression 
model can influence the regression coefficients in the regression model is provided by an analyst who 
was perplexed about the sign of a regression coefficient in the fitted regression model. The analyst had 
found in a regression of territory company sales on territory population size, per capita income, and 
sonte other predictor variables that the regression coefficient for population size was negative, and this 
conclusion was supported by a confidence interval for the regression coefficient. A consultant noted 
that the analyst did not include the major competitor's market penetration as a predictor variable in 
the model. The competitor was niost active and effective in territories with large populations, thereby 


Chapter 7 Multiple Regression П 285 


keeping company sales down in these territories. The result of the omission of this predictor variable 
from the model was a negative coefficient for the population size variable. [| 


Effects on Extra Sums of Squares. When predictor variables are correlated, the marginal 
contribution of any one predictor variable in reducing the error sum of squares varies, 
depending on which other variables are already in the regression model, just as for regression 
coefficients. For example, Table 7.2 provides the following extra sums of squares for X;: 


SSR(X1) = 352.27 
SSR(X1|X2) = 3.47 


The reason why SSR(X,|X2) is so small compared with SSR(X,) is that X4 and X» are 
highly correlated with each other and with the response variable. Thus, when X» is already 
in the regression model, the marginal contribution of X, in reducing the error sum of squares 
is comparatively small because X? contains much of the same information as Х|. 

The same story is found in Table 7.2 for X2. Here SSR(X5|X4) = 33.17, which is much 
smaller than SSR(X2) = 381.97. The important conclusion is this: When predictor variables 
are correlated, there is no unique sum of squares that can be ascribed to any one predictor 
variáble as reflecting its effect in reducing the total variation in Y. The reduction in the 
total variation ascribed to a predictor variable must be viewed in the context of the other 
correlated predictor variables already included in the model. 


Comments 
1. Multicollinearity also affects the coefficients of partial determination through its effects on the 
extra sums of squares. Note from Table 7.2 for the body fat example, for instance, that X, is highly 
correlated with Y: 
gp SSR(X1) — 35227 
"^ SSTO 49539 — 


However, the coefficient of partial determination between Y and X,, when X; is already in the 
regression model, is much smaller: 


71 


а SSR(Xi|X2) 3.47 
М2 ^ SSE(X; 113.42 ^ 


03 


The reason for the small coefficient of partial determination here is, as we have seen, that X, and 
X» are highly correlated with each other and with the response variable. Hence, X; provides only 
relatively limited additional information beyond that furnished by X2. 

2. The extra sum of squares for a predictor variable after other correlated predictor variables are 
in the model need not necessarily be smaller than before these other variables are in the model, as we 
found in the body fat example. In special cases, it can be larger. Consider the following special data 
set and its correlation matrix: 


P Жү „© Y X X 

| е 2 Y [10 .026 .976 

D B is X| 10 .243 
X2 1.0 


286 PartTwo Multiple Linear Regression 


Here, Y and X» are highly positively correlated, but Y and X, are practically uncorrelated. In additio 
X, and X» are moderately positively correlated. The extra sum of squares for X, when it is the оц 
variable in the model for this data set is SSR(X,) = .25, but when X3 already is in the model the extra 
sum of squares is SSR(X4|X5) = 18.01. Similarly, we have for these data: 


SSR(X2) = 362.49 SSR(X3| X1) = 380.25 


The increase in the extra sums of squares with the addition of the other predictor variable in the modal jg 
related to the special situation here that X, is practically uncorrelated with Y but moderately correlated 
with Хэ, which in turn is highly correlated with Y. The general point even here still holds—the extra 
sum of squares is affected by the other correlated predictor variables already in the model. 

When SSR(X1{X2) > SSR(X,), as in the example just cited, the variable X» is sometimes Called 
a suppressor variable. Since SSR(X2|X1) > SSR(X2) in the example, the variable X, would also be 
called a suppressor variable. 67 a 


Effects on s{b,}. Note from Table 7.2 for the body fat example how much more imprecise 
the estimated regression coefficients b, and b; become as more predictor variables are added 
to the regression model: 


* 


Variables in Model 5461} 516} 
Xi .1288 — 
X2 — .1100 
Xi, X2 .3034 .2912 
Х\, X2, Хз 3.016 2.582 


Again, the high degree of multicollinearity among the predictor variables is responsible for 
the inflated variability of the estimated regression coefficients. 


Effects on Fitted Values and Predictions. Notice in Table 7.2 for the body fat example 
that the high multicollinearity among the predictor variables does not prevent the mean 
square error, measuring the variability of the error terms, from being steadily reduced as 
additional variables are added to the regression model: 


Variables in Model MSE 
X 7.95 
Xi, X2 6.47 
Xa, X2, X3 6.15 


Furthermore, the precision of fitted values within the range of the observations on the 
predictor variables is not eroded with the addition of correlated predictor variables into 
the regression model. Consider the estimation of mean body fat when the only predictor 
variable in the model is triceps skinfold thickness (X1) for Хы = 25.0. The fitted value 
and its estimated standard deviation are (calculations not shown): 


Ў = 19.93 s{¥,} = .632 


When the highly correlated predictor variable thigh circumference (X2) is also included 
in the model, the estimated mean body fat and its estimated standard deviation are as follows 


Chapter7 Multiple Regression П 287 


for Ху = 25.0 and Хр = 50.0: 
Ӱ, = 1936 5{Ў„}—=.624 


Thus, the precision of the estimated mean response is equally good as before, despite the 
addition of the second predictor variable that is highly correlated with the first one. This 
stability in the precision of the estimated mean response occurred despite the fact that the 
estimated standard deviation of bı became substantially larger when X, was added to the 
model (Table 7.2). The essential reason for the stability is that the covariance between b, 
and b is negative, which plays a strong counteracting influence to the increase in s?(b, in 
determining the value of s?(Y ,) as given in (6.79). 

When all three predictor variables are included in the model, the estimated mean body 
fat and its estimated standard deviation are as follows for X,, = 25.0, Хро = 50.0, and 
Хз = 29.0: 


Ê, =19.19 5{Ў„} = .621 L 


Thus, the addition of the third predictor variable, which is highly correlated with the first two 
predictor variables together, also does not materially affect the precision of the estimated 
mean response. 


Effects on Simultaneous Tests of fj. А not infrequent abuse in the analysis of multiple 
regression models is to examine the 1* statistic in (6.516): 


bk 
s{bk} 


for each regression coefficient in turn to decide whether £j = © for k = 1,..., p— 1. Even 
if a simultaneous inference procedure is used, and often it is not, problems still exist when 
the predictor variables are highly correlated. 

Suppose we wish to test whether В = 0 and 8› = 0 in the body fat example regression 
model with two predictor variables of Table 7.2c. Controlling the family level of significance 
at .05, we require with the Bonferroni method that each of the two t tests be conducted with 
level of significance .025. Hence, we need 1(.9875; 17) = 2.46. Since both г“ statistics 
in Table 7.2c hàve absolute values that do not exceed 2.46, we would conclude from the 
two separate tests that В; =Q and that 8; — 0. Yet the proper F test for Ho: Bi = В = 0 
would lead to the conclusion H,, that not both coefficients equal zero. This can be seen 
from Table 7.2c, where we find F* = MSR/MSE = 192.72/6.47 = 29.8, which far exceeds 
F(.95; 2, 17) = 3.59. 

The reason for this apparently paradoxical result is that each г* test is a marginal test, 
as we have seen in (7.15) from the perspective, of the general linear test approach. Thus, 
a small SSR(X,|X2) here indicates that X, does not provide much additional information 
beyond X5, which already is in the model; hence, we are led to the conclusion that В, = 0. 
Similarly, we are led to conclude 8; = 0 here because SSR(X2| Xj) is small, indicating that 
X» does not provide much more additional information when X is already in the model. 
But the two tests of the marginal effects of X, and X; together are not equivalent to testing 
whether there is a regression relation between Y and the two predictor variables. The reason 
is that the reduced model for each of the separate tests contains the other predictor variable, 
whereas the reduced model for testing whether both В, = 0 and f; = 0 would contain 


ж __ 


288 PartTwo Multiple Linear Regression 


neither predictor variable. The proper F test shows that there is a definite regression relation 
here between Y and X, and X2. 

The same paradox would be encountered in Table 7.2d for the regression mode| With 
three predictor variables if three simultaneous tests on the regression coefficients Were 
conducted at family level of significance .05. 


Comments 


I. It was noted in Section 7.5 that a near-zero determinant of X'X is a potential source of Serious 
roundoff errors in normal equations calculations. Severe multicollinearity has the effect of making 
this determinant come close to zero. Thus, under severe multicollinearity, the regression Coefficients 
may be subject to large roundoff errors as well as large sampling variances. Hence, it is particularly 
advisable to employ the correlation transformation (7.44) in normal equations calculations when 
multicollinearity is present. 

2. Just as high intercorrelations among the predictor variables tend to make the estimated re. 
gression coefficients imprecise (i.e., erratic from sample to sample), so do the coefficients of partial 
correlation between the response variable and each predictor variable tend to become erratic from 
sample to sample when the predictor variables are highly correlated. 

3. The effect of intercorrelations among the predictor variables on the standard deviations of the 
estimated regression coefficients can be seen readily when the variables in the model are transformed 
by means of the correlation transformation (7.44). Consider the first-order model with two predictor 


variables: 
Y; = Po + BiXin + Хэ + е (7.61) 
This model in the variables transformed by (7.44) becomes: 
Y! = BEXS-EBIX + ef (7.62) 
The (X'X)! matrix for this standardized model is given by (7.50) and (7.54c): 
(X г = ED E rd (7.63) 
Hence, the variance-covariance matrix of the estimated regression coefficients is by (6.46) and (7.63): 
c^ (b) = (cr, = (oP ED E? m) (7.64) 


where (c)? is the error term variance for the standardized model (7.62). We see that the estimated 
regression coefficients Бү and b> have the same variance here: 


(с*)? 


2 


«Чы = 040) = 1— 
1 


(7.65) 


ә 


and that each of these variances become larger as the correlation between X, and X» increases. Indeed, 
as X, and X» approach perfect correlation (i.e., as ғ?, approaches 1), the variances of b; and b$ become 
larger without limit. 

4. We noted in our discussion of simultaneous tests of the regression coefficients that it is possi- 
ble that a set of predictor variables is related to the response variable, yet all of the individual tests 
on the regression coefficients will lead to the conclusion that they equal zero because of the multi- 
collinearity among the predictor variables. This apparently paradoxical result is also possible under 
special circumstances when there is no multicollinearity among the predictor variables. The special 
circumstances are not likely to be found in practice, however. ЫШ 


Chapter 7 Multiple Regression I 289 


Need for More Powerful Diagnostics for Multicollinearity 


As we have seen, multicollinearity among the predictor variables can have important con- 
sequences for interpreting and using a fitted regression model. The diagnostic tool con- 
sidered here for identifying multicollinearity—namely, the pairwise coefficients of simple 
correlation between the predictor variables—is frequently helpful. Often, however, serious 
multicollinearity exists without being disclosed by the pairwise correlation coefficients. In 
Chapter 10, we present a more powerful tool for identifying the existence of serious multi- 
collinearity. Some remedial measures for lessening the effects of multicollinearity will be 
considered in Chapter 11. 


Cited 
Reference 


Problems 


7.1. Kennedy, W. J., Jr., and J. E. Gentle. Statistical Computing. New York: Marcel Dekker, 1980. 


7.1. 


7.2. 
7.3. 


ITA: 


*7.5. 


*7.6. 


77. 


i 

State the number of degrees of freedom that are associated with each of the following extra 

sums of squares: (1) SSR(X4|X2); (2) SSR(X2|X1, Хз); (3) SSR(X 1, X2|Xs, Ха); (4) SSR(X1, 

X», X3| X4, X5). 

Explain in what sense the regression sum of squares SSR(X,) is an extra sum of squares. 

Refer to Brand preference Problem 6.5. 

a. Obtain the analysis of variance table that decomposes the regression sum of squares into 
extra sums of squares associated with X, and with Хэ, given Х|. 

b. Test whether X? can be dropped from the regression model given that X, is retained. Use 
the F* test statistic and level of significance .01. State the alternatives, decision rule, and 
conclusion. What is the P-value of the test? 

Refer to Grocery retailer Problem 6.9. 


a. Obtain the analysis of variance table that decomposes the regression sum of squares into 
extra sums of squares associated with X; with X3, given Х|; and with X2, given X, and Хз. 

b. Test whether Хә can be dropped from the regression model given that Х| and Хз are retained. 
Use the F* test statistic and œ = .05. State the alternatives, decision rule, and conclusion. 
What is the P-value of the test? 

c. Does SSR(X1) + SSR(X2|X1) equal SSR(X2) + SSR(X,| X2) here? Must this always be the 
case? 


Refer to Patient satisfaction Problem 6.15. 
a. Obtain the analysis of variance table that decomposes the regression sum of squares into 
extra sums of squares associated with X2; with Х|, given X5; and with X5, given X2 and Xi. 


b. Test whether Хз can be dropped from the regression model given that X, and X? are retained. 
Use the F* test statistic and level of significance .025. State the alternatives, decision rule, 
and conclusion. What is the P-yalue of the test? 


Refer to Patient satisfaction Problem 6.15. Test whether both X2 and Хз can be dropped from 
the regression model given that X, is retained. Use о = .025. State the alternatives, decision 
rule, and conclusion. What is the P-value of the test? 


Refer to Commercial properties Problem 6.18. 


2 


a. Obtain the analysis ОЁ variance table that decomposes the regression sum of squares into 
extra sums Of squares associated with X4; with X;, given X4; with X5, given X; and X4; 
and with Хз, given Xj, X2 and Ха. 


290 PartTwo Multiple Linear Regression 


7.8. 


*].9. 


7.10. 


7.11. 


*7.13. 


*].|4. 


7.15. 


7.16. 


*7.17. 


*7.18. 


b. Test whether Хз can be dropped from the regression model given that X4, X» and Xie 
retained. Use the F* tesi statistic and level of significance .01. State the alternatives, decision 
rule, and conclusion. What is the P-value of the test? 

Refer to Commercial properties Problems 6.18 and 7.7. Test whether both X and Хз сап be 

dropped from the regression model given that X; and X, are retained; use œ = .01, State the 

alternatives, decision rule, and conclusion. What is the P-value of the test? 


Refer to Patient satisfaction Problem 6. 15. Test whether Ву = — Е.О and Bo = 0; use œ = 025 
State the alternatives, full and reduced models, decision rule, and conclusion. : 
Refer to Commercial properties Problem 6.18. Test whether Bj = —.1 and = 4; uge 


о = .OF. State the alternatives, full and reduced models, decision rule, and conclusion. 

Refer to the work crew productivity example in Table 7.6. 

a. Calculate Ку, Кү), 0, Күү, Күз, and R^. Explain what each coefficiént measures and 
interpret your results. 


b. Are any of the results obtained in part (a) special because the two predictor variables are 
uncorrelated? 


2 


. Refer to Brand preference Problem 6.5. Calculate RẸ}, R55, Riz Ry ip, Кўз, and R?. Explain 


what each coefficient measures and interpret your results. 

Refer to Grocery retailer Problem 6.9. Calculate R},, R35. Къ, Ryo» Кўзи» Rpa, and R2, 

Explain what each coefficient measures and interpret your results. 

Refer to Patient satisfaction Problem 6.15. 

a. Calculate Кў, Ку», and Ку. How is the degree of marginal linear association between 
Y and X, affected, when adjusted for X2? When adjusted for both X» and Хз? 

b. Make a similar analysis to that in part (a) for the degree of marginal linear association 
between Y and X2. Are your findings similar to those in part (a) for Y and Х|? 

Refer to Commercial properties Problems 6.18 and 7.7. Calculate Куш, Rj,, Ryu, Rey 

Езопа, Кузиза» and R7. Explain what each coefficient measures and interpret your results. 

How is the degree of marginal linear association between Y and X, affected, when adjusted 

for X4? 

Refer to Brand preference Problem 6.5. 

a. Transform the variables by means of the correlation transformation (7.44) and fit the stan- 
dardized regression model (7.45). 

b. Interpret the standardized regression coefficient bj. 

c. Transform the estimated standardized regression coefficients by means of (7.53) back to the 
ones for the fitted regression model in the original variables. Verify that they are the same 
as the ones obtained in Problem 6.5b. 

Refer to Grocery retailer Problem 6.9. 

a. Transform the variables by means of the correlation transformation (7.44) and fit the 
standardized regression model (7.45). 

b. Calculate the coefficients of determination between all pairs of predictor variables. Is it 
meaningful here to consider the standardized regression coefficients to reflect the effect of 
one predictor variable when the others are held constant? 

c. Transform the estimated standardized regression coefficients by means of (7.53) back tothe 
ones for the fitted regression model in the original variables. Verify that they are the same 
as the ones obtained in Problem 6. Oa. 


Refer to Patient satisfaction Problem 6.15. 


7.19. 


7.20. 


721. 


7.22. 


7.23. 


7.24. 


*7.25. 


Chapter 7 Multiple Regression П 291 


a. Transform the variables by means of the correlation transformation (7.44) and fit the 
standardized regression model (7.45). 

b. Calculate the coefficients of determination between all pairs of predictor variables. Do these 
indicate that it is meaningful here to consider the standardized regression coefficients as 
indicating the effect of one predictor variable when the others are held constant? 

c. Transform the estimated standardized regression coefficients by means of (7.53) back to the 
ones for the fitted regression model in the original variables. Verify that they аге the same 
as the ones obtained in Problem 6.15c. 


Refer to Commercial properties Problem 6.18. 


a. Transform the variables by means of the correlation transformation (7.44) and fit the stan- 
dardized regression model (7.45). 

b. Interpret the standardized regression coefficient b. | 

c. Transform the estimated standardized regression coefficients by means of (7.53) back to the 
ones for the fitted regression model in the original variables. Verify that they are the sgme 
as the ones obtained in Problem 6.18c. 


A speaker stated in a workshop on applied regression analysis: “In business and the social 
Sciences, some degree of multicollinearity in survey data is practically inevitable." Does this 
statement apply equally to experimenta] data? 

Refer to the example of perfectly correlated predictor variables in Table 7.8. 


a. Develop another response function, like response functions (7.58) and (7.59), that fits the 
data perfectly. 


b. What is the intersection of the infinitely many response surfaces that fit the data perfectly? 


The progress report of a research analyst to the supervisor stated: “АП the estimated regression 
coefficients in our model with three predictor variables to predict sales are statistically sig- 
nificant. Our new preliminary mode] with seven predictor variables, which includes the three 
variables of our smaller model, is less satisfactory because only two of the seven regression 
coefficients are statistically significant. Yet in some initial trials the expanded model is giving 
more precise sales predictions than the smaller model. The reasons for this anomaly are now 
being investigated." Comment. 

Two authors wrote as follows: “Our research utilized a multiple regression model. Two of 
the predictor variables important in our theory turned out to be highly correlated in our data 
set. This made it difficult to assess the individual effects of each of these variables separately. 
We retained both variables in our model, however, because the high coefficient of multiple 
determination makes this difficulty unimportant.” Comment. 

Refer to Brand preference Problem 6.5. 


a. Fit first-order simple linear regression mode] (2.1) for relating brand liking (Y) to moisture 
content (X). State the fitted regression function. 


b. Compare the estimated regression coefficient for moisture content obtained in part (а) with 
the corresponding coefficient obtained in Problem 6.5b. What do you find? 

c. Does SSR(X,) equal SSR(X,|X2) here? If not, is the difference substantial? 

d. Refer to the correlation matrix obtained in Problem 6.5a. What bearing does this have on 
your findings in parts (б) апа (c)? , M 

Refer to Grocery retailer Problem 6.9. 


a. Fit first-order simple linear regression model (2.1) for relating total hours required to handle 
shipment (Y) to total number of cases shipped (Х|). State the fitted regression function. 


3 
a 


292 PartTwo Multiple Linear Regression 


b. Compare the estimated regression coefficient for total cases shipped obtained in part (a) 
with the corresponding coefficient obtained in Problem 6.10a. What do you find? 

c. Does SSR(X|) equal SSR(X || Хә) here? If not, is the difference substantial? 

d. Refer to the correlation matrix obtained in Problem 6.9c. What bearing does this haye оп 
your findings in parts (b) and (c)? 


*7] 26. Refer to Patient satisfaction Problem 6.15. 


7.27. 


a. Fit first-order linear regression model (6.1) for relating patient satisfaction (Y) to patients 
age (Х|) and severity of illness (Хз). State the fitted regression function. 

b. Compare the estimated regression coefficients for patient's age and severity of illness op. 
tained in part (a) with the corresponding coefficients obtained in Problem 6.15c. What do 
you find? 

c. Does SSR(X,) equal SSR(X;| Хз) here? Does SSR(X») equal SSR(X4| X4) 

d. Refer to the correlation matrix obtained in Problem 6.156, What bearing does it have on 
your findings in parts (b) and (c)? 


> 


Refer to Commercial properties Problem 6. 18. 

a. Fit first-order linear regression model (6.1) for relating rental rates (Y) to property age ( Xj) 
and size ( X4). State the fitted regression function. 

b. Compare the estimated regression coefficients for property age and size with the corre- 
sponding coefficients obtained in Problem 6.18c. What do you find? 

c. Does SSR(X4) equal SSR(X4| Хз) here? Does SSR(X,) equal SSR(X (X4)? 

d. Refer to the correlation matrix obtained in Problem 6. 18b. What bearing does this have on 
your findings in parts (b) and (c)? 


Exercises 


7.28. 


7.30. 


7.31. 


a. Define each of the following extra sums of squares: (1) SSR(X5|X1); (2) SSR(X3, Х.Х); 
(3) SSR(X4| Xi, Xo, X3). 

b. For a multiple regression model with five X variables, what is the relevant extra sum of 
squares for testing whether or not £5 = 0? whether or пої 8 = f4 = 0? 


. Show that: 


а. SSR(X1, X», Хз, Ха) = SSR(X 1) + SSROO, X3| X1) + SSR(X4|X1, Хә, X3). 

b. SSR(Xi, Хэ, Хз, X4) = SSR(X2, Хз) + SSR(X1|X2, Хз) + SSR(X4|X 4, X2, Хз). 

Refer to Brand preference Problem 6.5. 

a. Regress Y on X» using simple linear regression model (2.1) and obtain the residuals. 

b. Regress X, on X» using simple linear regression model (2.1) and obtain the residuals. 

c. Calculate the coefficient of simple correlation between the two sets of residuals and show 
that it equals гур. 


The following regression model is being considered in a water resources study: 


Y; = Bo + Bi Xii + Хо + BsXá Xia + Ba Xia + Ei 


State the reduced models for testing whether or not: (H) 6; = Ba = 0, (2) b: = 0, (3) Bi = 
В» = 5, (4) В; = 7. 


. The following regression model is being considered in a market research study: 


Y; = Bo + В. Ха + Хо + ВХ + є; 


Ргоўесїв 


7.33. 
7.34. 


7.35. 
7.36. 


7.37. 


7.38. 


Chapter 7 Multiple Regression II 293 


State the reduced models for testing whether or not: (1) В, = 83 = 0, (2) Bo = 0, (3) Вз = 5, 

(4) Bo = 10, (5) В, = В. 

Show the equivalence of the expressions in (7.36) and (7.41) for А51. 

Refer to the work crew productivity example in Table 7.6. 

a. For the variables transformed according to (7.44), obtain: (1) XX, (2) X'Y, (3) b, (4) s? [b]. 

b. Show that the standardized regression coefficients obtained in part (a3) are related to the 
regression coefficients for the regression model in the original variables according to (7.53). 

Derive the relations between the В, and f; in (7.462) for p — 1 = 2. 

Derive the expression for X'Y in (7.51) for standardized regression model (7.30.) for p — 1 = 2. 


“аг 


Refer to the CDI data set in Appendix С.2. For predicting the number of active physicians (Y) 
in a county, it has been decided to include total population (Х|) and total personal income (X2) 
as predictor variables. The question now is whether an additional predictor variable мош be 
helpful in the model and, if so, which variable would be most helpful. Assume that a first-order 
multiple regression mode] is appropriate. 


a. For each of the following variables, calculate the coefficient of partial determination given 
that X, and X; are included in the model: land area (X3), percent of population 65 or older 
(X4), number of hospital beds (X5), and total serious crimes (X6). 

b. Onthe basis of the results in part (a), which of the four additional predictor variables is best? 
Is the extra sum of squares associated with this variable Jarger than those for the other three 
variables? 

c. Using the F* test statistic, test whether or not the variable determined to be best in part (b) 
is helpful in the regression model when X, and X; are included in the mode]; use œ = .01. 
State the alternatives, decision rule, and conclusion. Would the F* test statistics for the other 
three potential predictor variables be as large as the one here? Discuss. 


Refer to the SENIC data set in Appendix C.1. For predicting the average length of stay of 
patients in a hospital (Y), it has been decided to include age (Х|) and infection risk (X2) as 
predictor variables. The question now is whether an additional predictor variable would be 
helpful in the mode! and, if so, which variable would be most helpful. Assume that a first-order 
multiple regression model is appropriate. 


a. For each of the following variables, calculate the coefficient of partial determination given 
that X, and X; are included in the model: routine culturing ratio (Хз), average daily census 
(X4), number of nurses (X5), and available facilities and services (Хо). 

b. On the basis of the results in part (a), which of the four additional predictor variables is best? 
Is the extra sum of squares associated with this variable larger than those for the other three 
variables? 

c. Using the F* test statistic, test whether or not the variable determined to be best in part (Р) 
is helpful in the regression mode! when X, and X» are included in the model; use œ = .05. 
State the alternatives, decision rule, and conclusion. Would the F* test statistics for the other 
three potential predictor variables be as large as the one here? Discuss. 


р # 


Chapter 


Regression Models 
for Quantitative 
and Qualitative Predictors 


In this chapter, we consider in greater detail standard modeling techniques for quantitative 
predictors, for qualitative predictors, and for regression models containing both quantitative 
and qualitative predictors. These techniques include the use of interaction and polynomial 
terms for quantitative predictors, and the use of indicator variables for qualitative predictors, 


8.1 Polynomial Regression Models 


We first consider polynomial regression models for quantitative predictor variables. They 
are among the most frequently used curvilinear response models in practice because they 
are handled easily as a special case of the general linear regression model (6.7). Next, we 
discuss several commonly used polynomial regression models. Then we present a case to 
illustrate some of the major issues encountered with polynomial regression models. 


Uses of Polynomial Models 


294 


Polynomial regression models have two basic types of uses: 


1. When the true curvilinear response function is indeed a polynomial function. 
2. When the true curvilinear response function is unknown (or complex) but a polynomial 
function is a good approximation to the true function. 


The second type of use, where the polynomial function is employed as an approximation 
when the shape of the true curvilinear response function is unknown, is very common. It 
may be viewed as a nonparametric approach to obtaining information about the shape of 
the response function. 

А main danger in using polynomial regression models, as we shall see, is that extrap- 
olations may be hazardous with these models, especially those with higher-order terms. 
Polynomial regression models may provide good fits for the data at hand, but may turn in 
unexpected directions when extrapolated beyond the range of the data. 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 295 


'One Predictor Variable—Second Order 


FIGURE 8.1 
Examples of 
Second-Order 
Polynomial 
Response 
Functions. 


Polynomial regression models may contain one, two, or more than two predictor variables. 
Further, each predictor variable may be present in various powers. We begin by considering 
a polynomial regression model with one predictor variable raised to the first and second 
powers: 


Y; = Bo + Bix; + Box? + е; (8.1) 


where: 
х= Xi = Х 


This polynomial model is called a second-order model with one predictor variable because 
the single predictor variable is expressed in the model to the first and second powers. Note 
that the predictor variable is centered—in other words, expressed as a deviation around its 
mean X—and that the ith centered observation is denoted by x;. The reason for using a 
centered predictor variable in the polynomial regression model is that X and X? often will be 
highly correlated. This, as we noted in Section 7.5, can cause serious computational difficul- 
ties when the XX matrix is inverted for estimating the regression coefficients in the normal 
equations calculations. Centering the predictor variable often reduces the multicollinear- 
ity substantially, as we shall illustrate in an example, and tends to avoid computational 
difficulties. 

The regression coefficients in polynomial regression are frequently written in a slightly 
different fashion, to reflect the pattern of the exponents: 


Y; = Bo + fixi + Bux? + ё; (8.2) 


We shall employ this latter notation in this section. 
The response function for regression model (8.2) is: 


E{Y} = fo + Bix + Bux? (8.3) 


This response function is a parabola and is frequently called a quadratic response function. 
Figure 8.1 contains two examples of second-order polynomial response functions. 


<~ 


30 EY) = 52 + 8x — 2x2 


296 PartTwo Multiple Linear Regression ^ 


The regression coefficient Во represents the mean response of Y when x = 0, ie wh 
X = X. The regression coefficient £, is often called the linear effect coefficient, ang В A 
called the quadratic effect coefficient. MR 


Comments 


1. The danger of extrapolating a polynomial response function is illustrated by the response functi 
in Figure 8. la. If this function is extrapolated beyond x = 2, it actually turns downward, it 
might not be appropriate in a given case. 

2. The algebraic version of the least squares normal equations: 

X'Xb = X'Y 
for the second-order polynomial regression model (8.2) can be readily obtaified from (6.77) by 
replacing Xj, by x; and X;» by x7. Since У) x; —0, this yields the normal equations: 


ps 
y» = nb +h л] 

Учу, = Уо? Tbu Уо? (8.4) 

Уху = by ox; +b, Уз + ри Sox 


a 
One Predictor Variable—Third Order 
The regression model: 
Y; = Bo + Bixi + Bux? + Bi; + е; (8.5) 
where: 
x =X — X 


is a third-order model with one predictor variable. The response function for regression 
mode! (8.5) is: 


Е{Ү} = Bo + Bix + Bux? + binx? (8.6) 


Figure 8.2 contains two examples of third-order polynomial response functions. 


One Predictor Variable—Higher Orders 


Polynomial models with the predictor variable present in higher powers than the third 
should be employed with special caution. The interpretation of the coefficients becomes 
difficult for such models, and the models may be highly erratic for interpolations and even 
small extrapolations. It must be recognized in this connection that a polynomial model of 
sufficiently high order can always be found to fit data containing no repeat observations 
perfectly. For instance, the fitted polynomial regression function for one predictor variable 
of order n — 1 will pass through all л observed Y values. One needs to be wary, therefore, of 

using high-order polynomials for the sole purpose of obtaining a good fit. Such regression 
functions may not show clearly the basic elements of the regression relation between X and 
Y and may lead to erratic interpolations and extrapolations. 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 297 


piGURES.2 — 7 | 

i riples 30 30 

Jar а -Огаег ҢҮ} = 16.3 — 1.45х — 15x? — .35х3 
polynomial 

‘Response 

Functions a 


10 дур = 22.45 + 1.45х + .15x2 + .35х3 


0 ~2 -1 0 1 2 х 
(a) i 
Two Predictor Variables—Second Order 
-The regression model: 
Ү, = fo + BiXi + Вох + Bux? + Вх + 12241442 + £i (8.7) 
where: 
хп = Ха = л 
хо = Xi — X2 
is a second-order model with two predictor variables. The response function is: 
E{Y} = б + Bui + Вох + Puxi + Bax? + Bixixo (8.8) 


which is the equation of a conic section. Note that regression model (8.7) contains separate 
linear and quadratic components for each of the two predictor variables and a cross-product 
term. The latter represents the interaction effect between x; and x2, as we noted in Chapter 6. 
The coefficient B12 is often called the interaction effect coefficient. 

Figure 8.3 contains a representation of the response surface and the contour curves for 
а second-order response function with two predictor variables: 


E{Y} = 1,740 — 4x? — 3x2 — 3x1 xa 


The contour curves correspond to different, response levels and show the various combi- 
nations of levels of the two predictor variables that yield the same level of response. Note 
that the response surface in Figure 8.3a has a maximum at x, = 0 and x2 = 0. Figure 6.2b 
presents another type of second-order polynomial résponse function with two predictor 
variables, this one containing a saddle point. 


P 


Comment 


The cross-product term £/i2x1xz in (8.8) is considered to be a second-order term, the same as 61x? 
or faax2. The reason can be seen by writing the latter terms as fij x1x, and B22xX2X2, respectively. Ш 


298 Part Two Multiple Linear Regression 


FIGURE 8.3 Example of a Quadratic Response Surface—E {Y} = 1,740 — 4х2 — 3x2 — 3x1x;. 
(a) Response Surface (b) Contour Curves 
—10 =5 0 5 10 


Se 

NEIN 

КОКО, 
LES 

ST 


x2 
© 


Three Predictor Variables—Second Order 
The second-order regression model with three predictor variables is: 


Y; = fo + Bixa + Вох + 63243 + Вих2 + PX» + saxa 


+ BizxnXio + Візхпхез + Bo3XioXis + €i (8.9) 
where: 
ха = Xn-—Xi 
хо = Хр — Хә 
хз = Хз — Хз 


The response function for this regression model is: 


E{Y} = fo + Bii + Baxa + faxa + Bux; + Bax + Вззх2 
+ b12x1X2 + Візхіхз + £23X2X3 (8.10) 


The coefficients £15, 613, and B23 are interaction effect coefficients for interactions between 
pairs of predictor variables. 


Implementation of Polynomial Regression Models 
Fitting of Polynomial Models. Fitting of polynomial regression models presents no new 
problems since, as we have seen in Chapter 6, they are special cases of the general linear 
regression model (6.7). Hence, all earlier results on fitting apply, as do the earlier results on 
making inferences. 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 299 


Hierarchical Approach to Fitting. When using a polynomial regression model as an 
approximation to the true regression function, statisticians will often fit a second-order or 
third-order model and then explore whether a lower-order model is adequate. For instance, 
with one predictor variable, the model: 


Y; = Bo + fixi + Bux? + Brix? + & 


may be fitted with the hope that the cubic term and perhaps even the quadratic term сап be 
dropped. Thus, one would wish to test whether or not В, = 0, or whether or not both В, = 0 
and 8111 = 0. The decomposition of SSR into extra sums of squares therefore proceeds as 
follows: 


E 


SSR(x) 
SSR(x?|x) 
SSR(x3 |x, x?) 


To test whether Ві =0, the appropriate extra sum of squares is SSR(x?|x, x?). If, in- 
stead, one wishes to test whether a linear term is adequate, i.e., whether В = 6111 = 0, the 
appropriate extra sum of squares is SSR(x?, x?|x) = SSR(x?|x) + SSR? |x, x”). 

With the hierarchical approach, if a polynomial term of a given order is retained, then 
all related terms of lower order are also retained in the model. Thus, one would not drop 
the quadratic term of a predictor variable but retain the cubic term in the model. Since the 
quadratic term is of lowey order, it is viewed as providing more basic information about the 
shape of the response function; the cubic term is of higher order and is viewed as providing 
refinements in the specification of the shape of the response function. The hierarchical 
approach to testing operates similarly for polynomial regression models with two or more 
predictor variables. Here, for instance, an interaction term (second power) would not be 
retained without also retaining the terms for the predictor variables to the first power. 


Regression Function in Terms of X. After a polynomial regression model has been 
developed, we often wish to express the final model in terms of the original variables rather 
than keeping it in terms of the centered variables. This can be done readily. For example, the 
fitted second-order model for one predictor variable that is expressed in terms of centered 
values x — X — X: 


Y = bo + bix + bux? (8.11) 
becomes in terms of the original X variable: 
Ê =b +b X + 5X (8.12) 
where: ; 
bo = NS bX + by, X? А (8.12а) 
b, = — 2b, X , (8.12Ь) 
b = bu : А (8.12с) 


The fitted values and residuals for the regression function in terms of X are exactly the 
same as for the regression function in terms of the centered values x. The reason, as we 


300 PartTwo Multiple Linear Regression 


noted earlier, for utilizing a model that is expressed in terms of centered observations is to 
reduce potential calculational difficulties due to multicollinearity among X, X?, x3 elc 
inherent in polynomial regression. 2 


Comment 


The estimated standard deviations of the regression coefficients in terms of the centered variables x 
in (8.11) do not apply to the regression coefficients in terms of the original variables X in (8.12). If 
the estimated standard deviations for the regression coefficients in terms of X are desired, they may 
be obtained by using (5.46), where the transformation matrix A is developed from (8.12a—c), a 


Case Example 


Setting. A researcher studied the effects of the charge rate and temperature 6n the life 
of a new type of power cell in a preliminary small-scale experiment. The charge rate (Xi) 
was controlled at three levels (.6, 1.0, and 1.4 amperes) and the ambient temperature ( Xj) 
was controlled at three levels (10, 20, 30°C). Factors pertaining to the discharge of the 
power cell were held at fixed levels. The life of the power cell (Y) was measured in terms 
of the number of discharge-charge cycles that a power cell underwent before it failed. The 
data obtained in the study are contained in Table 8.1, columns 1-3. 

The researcher was not sure about the nature of the response function in the range of the 
factors studied. Hence, the researcher decided to fit the second-order polynomial regression 
model (8.7): 


Y; = Bo + fixi + faxi2 + Bux, + Pax? + BioxiXia + 6i (8.13) 
for which the response function is: 
E{Y} = Bo + Bua + faxa + Вих + Bax + Bii (8.14) 


TABLE 8.1 Data—Power Cells Example. 


- = 


= © \© сом С Ud шу Мо — 


башыр: Жї 9 Фф. б бу. 00: 08) 

umber о arge 

Cycles Rate Temperature Coded Values 

Y; Xn X 12 Xn Хэ XA x), XnXp 
150 .6 10 Si —1 1 1 1 
131 1.0 20 0 0 0 0 0 
184 1.0 20 0 0 0 0 0 
109 1.4 20 1 0 1 0 0 
279 .6 30 —1 1 1 1 E 
235 1.0 30 0 1 0 1 0 
224 1.4 30 1 1 1 1 1 
X, 21.0 Х = 20 


Setting adapted Irom: S. M. Sidik. Н. Е. Leibecki. and J. M. Bozek, Cycies Till Failure of Silver-Ziuc Cells with Competing Faijure Modes—Preliminary Data 
Analysis, NASA Technical Memorandum 815-56, 1980. 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 301 


Because of the balanced nature of the X, and X» levels studied, the researcher not only 
centered the variables X, and X» around their respective means but also scaled them in 
convenient units, as follows: 

Ха-Х,_Ха-10 


Xi = = 


A 4 
Xi; — Xo _ Хр — 20 
10 — 10 

Here, the denominator used for each predictor variable is the absolute difference between 
adjacent levels of the variable. These centered and scaled variables are shown in columns 4 
and 5 of Table 8.1. Note that the codings defined in (8.15) lead to simple coded values, —1, 
0, and 1. The squared and cross-product terms are shown in columns 6-8 of Table 8.1. 

Use of the coded variables x, and x» rather than the original variables X, and X»? reduces 
the correlations between the first power and second power terms markedly here: 


(8.15) 


Хү = 


i 
Correlation between Correlation between 
X, and X2: .991 X2 and X2: .986 
| x, and x7: 0.0 хапа x3: 0.0 


The correlations for the coded variables are zero here because of the balance of the design 
of the experimental levels of the two explanatory variables. Similarly, the correlations 
between the cross-product term xx? and each of the terms x1, x2, x2, x2 are reduced to zero 
here from levels between .60 and .76 for the corresponding terms in the original variables. 
Low levels of multicollinearity can be helpful in avoiding computational inaccuracies. 

The researcher was particularly interested in whether interaction effects and curvature 
effects are required in the model for the range of the X variables considered. 


Fitting of Model. Figure 8.4 contains the basic regression results for the fit of model (8.13) 
with the SAS regression package. Using the estimated regression coefficients (labeled 
Parameter Estimate), we see that the estimated regression function is as follows: 


Y = 162.84 — 55.83x1 + 75.50x2 + 27.39x? — 10.61x2 + 11.50xix; (8.16) 


Residual Plots. 'The researcher first investigated the appropriateness of regression 
model (8.13) for the data at hand. Plots of the residuals against Y, x,, and xz are shown 
in Figure 8.5, as is also a normal probability plot. None of these plots suggest any gross 
inadequacies of regression model (8.13). The coefficient of correlation between the ordered 
residuals and their expected values under normality is .974, which supports the assumption 
of normality of the error terms (see Table B.6). І 


Test of Fit. Since there are three replications at x, = 0, x2 = 0, another indication of the 

adequacy of regression model (8.13) can be obtained by the formal test in (6.68) of the good- 

ness of fit of the regression function (8.14). The pure error sum of squares (3.16) is simple 

to obtain here, because there is only one combination of levels at which replications occur: 
SSPE = (157 — 157.33)? + (131 — 157.33)? + (184 — 157.33)? 


— 1,404.67 


— UM 


TIE „л, 


302 Part Two Multiple Linear Regression 


FIGURE 8.4 
SAS 
Regression 
Output for 
Second-Order 
Polynomial 
Model 
(8.13)—Power 
Cells Example. 


Model: MODEL1 
Dependent Variable: Y 


Analysis of Variance 


Sum of Mean 

Source DF Squares Square F Value Prob»F 
Model 5 55365.56140 11073.11228 id ses 0.0109 
Error 5 5240.43860 1048.08772 
C Total 10 60606.00000 

Root MSE 32.37418 R-square 0.9135 

Dep Mean 172.00000 Adj R-sq 0.8271 

С.У. 18.82220 

Parameter Estimates „е? 
Parameter Standard T for H0: 

Variable DF Estimate Error Parameter=0 Prob > ITI 
INTERCEP 1 162.842105 16.60760542 9.805 0.0002 
Xi 1 -55.833333 13.21670483 -4.224 0.0083 
x2 1 75.500000 13.21670483 57712 0.0023 
X1SQ 1 27.394737 20.34007956 1.347 0.2359 
X2SQ 1 -10.605263 20.34007956 E -0.521 0.6244 
Х1Х2 1 11.500000 16.18709146 0.710 0.5092 
Variable DF Type I SS 
INTERCEP 1 325424 
X1 1 18704 
x2 1 34202 
X1SQ 1 1645.966667 
X2SQ 1 284.928070 
X1X2 1 529 .000000 


Since there are c = 9 distinct combinations of levels of the X variables here, there are 
n — с = 11 — 9 = 2 degrees of freedom associated with SSPE. Further, SSE = 5,240.44 
according to Figure 8.4; hence the lack of fit sum of squares (3.24) is: 


SSLF = SSE — SSPE = 5,240.44 — 1,404.67 = 3,835.77 


with which c — p = 9 — 6 = 3 degrees of freedom are associated. (Remember that p = 6 
regression coefficients in mode] (8.13) had to be estimated.) Hence, test statistic (6.68b) for 
testing the adequacy of the regression function (8.14) is: 

. SSLF | SSPE 3,835.77 | 1,404.67 
"c-p n-c 3 і 2 


Е* = 1.82 

For a = .05, we require F(.95; 3, 2) = 19.2. Since F* = 1.82 < 19.2, we conclude 
according to decision rule (6.68c) that the second-order polynomial regression function 
(8.14) is a good fit. 


Coefficient of Multiple Determination. Figure 8.4 shows that the coefficient of multiple 
determination (labeled R-square) is R? — .9135. Thus, the variation in the lives of the power 
cells is reduced by about 91 percent when the first-order and second-order relations to the 
charge rate and ambient temperature are utilized. Note that the adjusted coefficient of mut 
tiple correlation (labeled Adj R-sq) is А2 = .8271. This coefficient is considerably smaller 
here than the unadjusted coefficient because of the relatively large number of parameters in 
the polynomial regression function with two predictor variables. 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 303 


FIGURE 8.5 (a) Residual Plot against Y 
Diagnostic 
Residual 


plots—Power 
Cells Ex ampl e. 


(b) Residual Plot against x, 


Residual 
Residual 


0 100 200 300 


(с) Residual Plot against x; 


Residual 
Residual 


—60 —40 -20 0 20 40 60 
Expected 


Partial F Test. The researcher now turned to consider whether a first-order model would 
be sufficient. The test alternatives are: 


Ho: Bi = b2 = Bi2 = 0 
H,: not all Bs in Ho equal zero 


The partial F test statistic (7.27) here is: 


wee SSR(x?, x2, хухә|ху, x2) 
= 3 


= MSE . 


In anticipation of this test, the researcher entered the X variables in the SAS regression 
program in the order х}, x2, X^ х2, X1X2, as тау be seen at the bottom of Figure 8.4. The 
extra sums of squares are labeled Type I SS. The first sum of squares shown is not relevant 
here. The second one is SSR(x;) = 18,704, the third one is SSR(x2|x1) = 34,202, and so 


304 PartTwo Multiple Linear Regression 


FIGURE 8.6 
S-Plus Plot of 
Fitted 
Response Plane 
(8.19) —Power 
Cells Example. 


on. The required extra sum of squares is therefore obtained as follows: 
SSR (xy. х3, хіх ra. x2) = SSR(x7}|x1, x2) + SSRG3 pa. хә, x?) 
+ SSR (x1%2|x1, x2. x1. x2) 
= 1,646.0 + 284.9 + 529.0 = 2,459.9 


We also require the error mean square. We find in Figure 8.4 that itis MSE = 1,048.1. Hence 
the test statistic is: 


F* 


459.9 
= 2, T + 1,048.1 = .78 


For level of significance œ = .05, we require F(.95; 3. 5) = 5.41. Since F* = `./8 < 541 
we conclude Му, that no curvature and interaction effects are needed, so that a first-order 
model is adequate for the range of the charge rates and temperatures considered. 


First-Order Model. On the basis of this analysis, the researcher decided to consider tpe 
first-order model: Е 


Y; = Bo + Bix + Paxi + & (8.17) 
A fit of this model yielded the estimated response function: 
Ӯ = 172.00 — 55.83x, +75.50x2 (8.18) 


(12.67) (12.67) 


Note that the regression coefficients b, and bz are the same as in (8.16) for the fitted second- 
order model. This is a result of the choices of the X, and X; levels studied. The num- 
bers in parentheses under the estimated regression coefficients are their estimated standard 
deviations. A variety of residual plots for this first-order model were made and analyzed 
by the researcher (not shown here), which confirmed the appropriateness of first-order 
model (8.17). 


Fitted First-Order Modelin Terms of X. The fitted first-order regression function (8.18) 
can be transformed back to the original variables by utilizing (8.15). We obtain: 


Ў = 160.58 — 139.58X, + 7.55X; (8.19) 


Figure 8.6 contains an S-Plus regression-scatter plot of the fitted response plane. The 
researcher used this fitted response surface for investigating the effects of charge rate and 
temperature on the life of this new type of power cell. 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 305 


Estimation of Regression Coefficients. The researcher wished to estimate the linear 
effects of the two predictor variables in the first-order model, with a 90 percent family 
confidence coefficient, by means of the Bonferroni method. Here, g —2 statements are 
desired; hence, by (6.52a), we have: 


B = 1[1 — .10/2(2)] = ¢(.975; 8) = 2.306 
The estimated standard deviations of b; and b; in (8.18) apply to the model in the coded vari- 


ables. Since only first-order terms are involved in this fitted model, we obtain the estimated 
standard deviations of P, and b, for the fitted model (8.19) in the original variables as follows: 


1 12.67 


1 12.67 
sith) = (1 sta = ip 71267 


i 
The Bonferroni confidence limits by (6.52) therefore are —139.58 -Е 2.306(31.68) and 
7.55 + 2.306(1.267), yielding the confidence limits: 


; —212.6 < В < —665 4.6 < 8 < 10.5 


With confidence .90, we conclude that the mean number of charge/discharge cycles before 
failure decreases by 66 to 213 cycles with a unit increase in the charge rate for given ambient 
temperature, and increases by 5 to 10 cycles with a unit increase of ambient temperature 
for given charge rate. The researcher was satisfied with the precision of these estimates for 
this initial small-scale study. 


Some Further Comments on Polynomial Regression 

1. The use of polynomial models is not without drawbacks. Such models can be more 
expensive in degrees of freedom than alternative nonlinear models or linear models with 
transformed variables. Another potential drawback is that serious multicollinearity may be 
present even when the predictor variables are centered. 

2. An alternative to using centered variables in polynomial regression is to use orthog- 
onal polynomials. Orthogonal polynomials are uncorrelated. Some computer packages use 
orthogonal polynomials in their polynomial regression routines and present the final fitted 
results in terms of both the orthogonal polynomials and the original polynomials. Orthog- 
onal polynomials are discussed in specialized texts such as Reference 8.1. 

3. Sometimes a quadratic response function is fitted for the purpose of establishing the 
linearity of the response function when repeat observations are not available for directly 
testing the linearity of the response function. Fitting the quadratic model: 


Y, = Pot Віх + Bux? + 8 (8.20) 


and testing whether В; = 0 does not, however, necessarily establish that a linear response 
function is appropriate. Figure 8.2a provides an example. If sample data were obtained for 
the response function in Figure 8.2a, model (8.20) fitted, and a test on £1, made, it likely 
would lead to the conclusion that £1; = 0. Yét a linear response function clearly might not 
be appropriate. Examination of residuals would disclose this lack of fit and should always 
accompany formal testing of polynomial regression coefficients. 


306 PartTwo Multiple Linear Regression 


8.2 Interaction Regression Models 


We have previously noted that regression models with cross-product interaction effects 
such as regression model (6.15), are special cases of general linear regression model (67), 
We also encountered regression models with interaction effects briefly when ме considereq 
polynomial regression models, such as model (8.7). Now we consider in some detail re. 
gression models with interaction effects, including their interpretation and implementation. 


Interaction Effects 


A regression model with p — \ predictor variables contains additive effects if the response 
function can be written in the form: E 


E(Y) = fi(X) + РО) +--+ + (Хр) (8.21) 
where fi, fa, .... fp-1 can be any functions, not necessarily simple ones. For instance, 
the following response function with two predictor variables can be expressed in the form 


of (8.21): 


E{Y} = Bo + В.Х. + BX? + fX; 
KH sy NS 
Ха) f3U) 


We say here that the effects of X; and X2 on Y are additive. 
In contrast, the following regression function: 


E{Y} = Bo t+ В.Х, + BoX2 + B3X1X2 


cannot be expressed in the form (8.21). Hence, this latter regression model is not additive, 
or, equivalently, it contains an interaction effect. 

А simple and commonly used means of modeling the interaction effect of two predictor 
variables on the response variable is by a cross-product term, such as 83 X, X» in the above 
response function. The cross-product term is called an interaction term. More specifically, 
itis sometimes called a linear-by-linear or a bilinear interaction term. When there are three 
predictor variables whose effects on the response variable are linear, but the effects on Y of 
X, and X5 and of X; and X; are interacting, the response function would be modeled as 
follows using cross-product terms: 


E{Y} = Po + В.Х. + B2X2 + B3X3 + ВХ, Хә + ВХ, Хз 


Interpretation of Interaction Regression Models with Linear Effects 
We shall explain the influence of interaction effects on the shape of the response function 
and on the interpretation of the regression coefficients by first considering the simple case of 
two quantitative predictor variables where each has a linear effect on the response variable. 


Interpretation of Regression Coefficients. The regression model for two quantitative 
predictor variables with linear effects on Y and interacting effects of X, and X» on Y 
represented by a cross-product term is as follows: 


Y; = Bo + Ха + БХ, + В3Х Хо + Ei (8.22) 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 307 


The meaning of the regression coefficients В; and f; here is not the same as that given earlier 
because of the interaction term 6з Хп Хә. The regression coefficients £, and f; no longer 
indicate the change in the mean response with a unit increase of the predictor variable, with 
the other predictor variable held constant at any given level. It can be shown that the change 
in the mean response with a unit increase in X; when Х is held constant is: 


В. + 3X2 (8.23) 


Similarly, the change in the mean response with a unit increase in X? when Х| is held 
constant is: 


Bo + BXi (8.24) 


Hence, in regression model (8.22) both the effect of X, for given level of X; and the effect 
of X» for given level of X, depend on the level of the other predictor variable. : 

We shall illustrate how the effect of one predictor variable depends on the level of the 
other predictor variable in regression model (8.22) by returning to the sales promotion 
response function shown in Figure 6.1 on page 215. The response function (6.3) for this 
example, relating locality sales (Y) to point-of-sale expenditures (X1) and TV expenditures 
(X2), is additive: 


E(Y] = 104+2X1+5X2 (8.25) 


In Figure 8.7a, we show the response function E(Y] as a function of X, when X2=1 
and when Хз = 3. Note that the two response functions are parallel—that is, the mean 
sales response increases by the same amount Ву —2 with a unit increase of point-of-sale 
expenditures whether TV expenditures are X; —1 or X? —3. The plot in Figure 8.7a is 
called a conditional effects plot because it shows the effects of X, on the mean response 
conditional on different levels of the other predictor variable. 

In Figure 8.7b, we consider the same response function but with the cross-product term 
.5X 1X2 added for interaction effect of the two types of promotional expenditures on sales: 


E{Y} = 10-- 2X4 + 5X2 + .5X, X2 (8.26) 


FIGURE 8.7 Illustration of Reinforcement and Interference Interaction Effects—Sales Promotion Example. 


X, = 3: ҢҮ} = 25 + 2X 


X, = 1: AY} 15 + 2X, 


(b) (с) 
(а) Reinforcement Interference 
Additive Model Interaction Effect Interaction Effect 

Y |X, = 3: EY) = 25 + 3.5Xq Y 
60 60 
45 45 Х = 3: EY] = 25 + .5X 
30 
15 


10 X« 0 5 10 xX, 0 5 10 X 


308 PartTwo Multiple Linear Regression 


We again use a conditional effects plot to show the response function E[Y] as a function 
of X, conditional on X? = I and on X; = 3. Note that the slopes of the response functions 
plotted against X, now differ for X; = 1 and X; = 3. The slope of the response function 
when X; = 1 is by (8.23): 


В. + fX) = 2 + .5(1) = 2.5 
and when Хэ = 3, the slope is: 
Bi + B3X2 = 2 + .5(3) = 3.5 


Thus, a unit increase in point-of-sale expenditures has a larger effect on sales when TV 
expenditures are at a higher level than when they are at a lower level. 

Hence, f, in regression model (8.22) containing a cross-product term for interaction 
effect no longer indicates the change in the mean response for a unit increase in X, for апу 
given X» level. That effect in this model depends on the level of X;. Although the mean 
response in regression model (8.22) when X; is constant is stila linear function of X 1, Now 
both the intercept and the slope of the response function change as the level at which X, is 
held constant is varied. The same holds when the mean response is regarded as a function 
of X2, with X; constant. 

Note that as a result of the interaction effect in regression model (8.26), the increase 
in sales with a unit increase in point-of-sale expenditures is greater, the higher the leve] 
of TV expenditures, as shown by the larger slope of the response function when X; =3 
than when X5 = 1. A similar increase in the slope occurs if the response function against 
Хә is considered for higher levels of Х|. When the regression coefficients В and f^ are 
positive, we say that the interaction effect between the two quantitative variables is of a 
reinforcement or synergistic type when the slope of the response function against one of the 
predictor variables increases for higher levels of the other predictor variable (i.e., when f 
is positive), 

If the sign of f; in regression model (8.26) were negative: 


E(Y) = 10--2X; + 5X5 — .5X,X; (8.27) 


the result of the interaction effect of the two types of promotional expenditures on sales 
would be that the increase in sales with a unit increase in point-of-sale expenditures becomes 
smaller, the higher the level of TV expenditures. This effect is shown in the conditional 
effects plot in Figure 8.7c. The two response functions for Хэ = 1 and X» = 3 are again 
nonparallel, but now the slope of the response function is smaller for the higher level of 
TV expenditures. A similar decrease in the slope would occur if the response function 
against X» is considered for higher levels of X. When the regression coefficients В and 
f» are positive, we say that the interaction effect between two quantitative variables is of 
an interference or antagonistic type when the slope of the response function against one of 
the predictor variables decreases for higher levels of the other predictor variable (i.e., when 
Вз is negative). 


Comments 


Е. When the signs of Ву and f in regression model (8.22) are negative, a negative £4 is usually 
viewed as a reinforcement type of interaction effect and a positive £4 as an interference type of effect 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 309 


v. 


2. То derive (8.23) and (8.24), we differentiate: 
E(Y] = Bo + i Xi + BoX2 + B5 Xy X2 


with respect to X, and X», respectively: 


3EY) _ JELY) 
ox, Pit PX. ay 


= Bo + fX 


ш 
Shape of Response Function. Figure 8.8 shows for the sales promotion example the 
impact of the interaction effect on the shape of the response function. Figure 8.8a presents the 
additive response functionin (8.25), and Figures 8.8b and 8.8c present the response functions 
with the reinforcement interaction effect in (8.26) and with the interference interaction effect 
in (8.27), respectively. Note that the additive response function is a plane, but that the two 
response functions with interaction effects are not. Also note in Figures 8.8b and 8.8c that 
the mean response as a function of X4, for any given level of X», is no longer parallel to the 
same function at a different level of X», for either type of interaction effect. 

We can also illustrate the difference in the shape of the response function when the 
two predictor variables do and do not interact by representing the response surface by 
means of a contour diagram. As we noted previously, such a diagram shows for different 
response levels the various combinations of levels of the two predictor variables that yield 
the same level of response. Figure 8.8d shows a contour diagram for the additive response 
surface in Figure 8.8a when the two predictor variables do not interact. Note that the contour 
curves are straight lines and that the contour lines are parallel and hence equally spaced. 
Figures 8.8e and 8.8f show contour diagrams for the response surfaces in Figures 8.8b 
and 8.8c, respectively, where the two predictor variables interact. Note that the contour 
curves аге no longer straight lines and that the contour curves are not parallel here. For 
instance, in Figure 8.8e the vertical distance between the contours for E(Y) = 200 and 
E{Y}= 400 at X, = 10 is much larger than at X; = 50. 

In general, additive or noninteracting predictor variables lead to parallel contour curves, 
whereas interacting predictor variables lead to nonparallel contour curves. 


Interpretation of Interaction Regression Models with Curvilinear Effects 
When one or more of the predictor variables in a regression model have curvilinear effects 
on the response variable, the presence of interaction effects again leads to response functions 
whose contour curves are not parallel. Figure 8.9a shows the response surface for a study 
of the volume of a quick bread: 


E(Y) = 65 + 3X, + 4X; — 10X? — 15X2 + 35X1X2 


Here, Y is the percentage increase in the volume of the quick bread from baking, X, is the 
amount of a leavening agent (coded), and X3 is the oven temperature (coded). Figure 8.9b 
shows contour curves for this response function. Note the lack of parallelism in the contour 
curves, reflecting the interaction effect. Figure 8.10 presents a conditional effects plot to 
show in a simple fashion the nature of the interaction in the relation of oven temperature (X2) 
to the mean volume when leavening agent amount (X1) is held constant at different levels. 
Note that increasing oven temperature increases volume when leavening agent amount is 
high, and the opposite is true when leavening agent amount is low. 


310 PartTwo Multiple Linear Regression 


FIGURE 8.8 
Response 
Surfaces and 
Contour Plots 
for Additive 


and Interaction 


Regression 


Models— Sales 


Promotion 
Example. 


(a) Additive Model 


Е ЕЗЕНШЕ ЕБЕ 
0 10 20 30 40 50 


х1 


0 EGG 


(b) Reinforcement Interaction Effect 


(d) Additive Model 
І =з] 


(f) Interference Interaction Effect 


40 Ed v, 
%, 
M ЕЯ 


30 


х2 


20 


10 


0 ER DES p p. 
0 10 20 30 40 50 


xi 


oa Chapter 8 Regression Models for Quantitative and Qualitative Predictors 311 


8.9 Response Surface and Contour Curves for Curvilinear Regression Model with Interaction 


FIGURE 
FIGUR uick Bread Volume Example. 


piect 


(a) Fitted Response Surface (b) Contour Plot 


100.0 


80.0 


& 8 
о o 


Fitted Response" 


FIGURE 8.10 
Conditional 

Effects Plot for 
Curvilinear 
Regression 

Model with - 
Interaction 
Effect—Quick 

Bread Volume 
Example. 


Implementation of Interaction Regression Models 
The fitting of interaction regression models is routine, once the appropriate cross-product 
terms have been added to the data set. Two considerations need to be kept in mind when 
developing regression models with interaction effects. 


1. When interaction terms are added to a regression model, high multicollinearities may 
exist between some of the predictor variables and some of the interaction terms, as well as 
among some of the interaction terms. A partial remedy to improve computational accuracy 
is to center the predictor variables; i.e., to use xi, = Xi — Xp. 

2. When the number of predictor variables in the regression model is large, the poten- 
tial number of interaction terms can become very large. For example, if eight predictor 


312 PartTwo" Multiple Linear Regression 


Example 


variables are present in the regression model in linear terms, there are potentially 28 pair 
wise interaction terms that could be added to the regression model. The data set woulg Deed 
to be quite large before 36 X variables could be used in the regression model. 


lt is therefore desirable to identify in advance, whenever possible, those interaction; 
that are most likely to influence the response variable in important ways. In addition t 
utilizing a priori knowledge, one can plot the residuals for the additive regression той 
against the different interaction terms to determine which ones appear to be influential 
in affecting the response variable. When the number of predictor variables is large, these 
plots may need to be limited to interaction terms involving those predictor variables that 
appear to be the most important on the basis of the initial fit of the additive regression 
model. 


We wish to test formally in the body fat example of Table 7.1 whether interaction terms be. 
tween the three predictor variables should be included in the regression model. We therefore 
need to consider the following regression model: ý 


Y; = Po + Bı Xii + ВХ + BsXis + ВХ Хо + В5Х Хз + ВьХо Хз +e; (8.28) 


This regression model requires that we obtain the new variables X, X2, X X;, and ХХ,» 
and add these X variables to the ones in Table 7.1. We find upon examining these X variables 
that some of the predictor variables are highly correlated with some of the interaction 
terms, and that there are also some high correlations among the interaction terms. For 
example, the correlation between X, and XX» is .989 and that between X, X; and XX, 
is .998. 

We shall therefore use centered variables in the regression model: 


Y; = Bo + Bixi + Boxia + Взхз + Baxnxiz + BsxiXis + BeXioxis +E (8.29) 


where: 
xj = Ха — Xi = Хи — 25.305 
Xi» = Xia X,— Xia 51.170 


ха = Xn — X4 = Ха — 27.620 


Upon obtaining the cross-product terms using the centered variables, we find that the in- 
tercorrelations involving the cross-product terms are now smaller. For example, the largest 
correlation, which was between X, X3 and X2X3, is reduced froin .998 to .891. Other cor- 
relations are reduced in absolute magnitude even more. 

Fitting regression model (8.29) yields the following estimated regression function, mean 
square error, and extra sums of squares: 


P = 20.53 + 3.438x, — 2.095x2 — 1.616x3 + .00888xi x5 — .08479x, x; + -090421243 
MSE — 6.745 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 313 


Variable Extra Sum of Squares 
Xx, SSR(x1) = 352.270 
X2 SSR(x2|X1) = 33.169 
X3 SSR(x3|X1, X2) — 11.546 
X1X2 SSRQa X2|X1, X2, X3) = 1.496 
X1X3 558(ху X3|X1, X2, Хз, ХХ) = 2.704 
X2X3 SSR(X2X3|X1, Хә, Хз, X1X2, X1X3) = 6.515 


We wish to test whether any interaction terms are needed: 


Но: Ba = Bs = Bo = 0 
H,: not all Bs in Ho equal zero 


The partial F test statistic (7.27) requires here the following extra sum of squares: 


SSR(x1x5, X1X3, X2X3|X1, X2, Хз) = 1.496 + 2.704 + 6.515 = 10.715 


wv 


and the test statistic is: 


_ SSR(X1X2, X1X3, X2X3|X1, X2, Хз) . 


3 
71 
= “5 + 6.745 = .53 


F* MSE 


For level of significance a = .05, we require F(.95; 3, 13) = 3.41. Since F* = .53 < 3.41, 
we conclude Но, that the interaction terms аге not needed in the regression model. The 
P-value of this test is .67. 


8.3 Qualitative Predictors 


As mentioned in Chapter 6, qualitative, as well as quantitative, predictor variables can be 
used in regression models. Many predictor variables of interest in business, economics, 
and the social and biological sciences are qualitative. Examples of qualitative predictor 
variables are gender (male, female), purchase status (purchase, no purchase), and disability 
status (not disabled, partly disabled, fully disabled). 

In a study of innovation in the insurance industry, an economist wished to relate the speed 
with which a particular insurance innovation is adopted (Y) to the size of the insurance firm 
(X1) and the type of firm. The response variable is measured by the number of months 
elapsed between the time the first firm adopted the innovation and the time the given firm 
adopted the innovation. The first predictor variable, size of firm, is-quantitative, and is 
measured by the amount of total assets of the firm. The second predictor variable, type of 
firm, is qualitative and is composed of two classes—stock companies and mutual companies. 
In order that such a qualitative variable can be used in a regression model, quantitative 
indicators for the classes of the qualitative variable must be employed. 


314 PartTwo Multiple Linear Regression 


Qualitative Predictor with Two Classes 
There are many ways of quantitatively identifying the classes of a qualitative variable, We 
shall use indicator variables that take on the values О and 1. These indicator variables аге 
easy to use and are widely employed, but they are by no means the only way to quantify a 
qualitative variable. 
For the insurance innovation example, where the qualitative predictor variable hag two 
classes, we might define two indicator variables X2 and X, as follows: 


у» = і if stock company 
> |0 otherwise 


: (8.30) 
Е if mutual company 
Хз = 
: О otherwise 
A first-order model then would be the following: 
Y; = Ba Ха + Хо + BsXis + Ei (8.31) 


This intuitive approach of setting up an indicator variable for each class of the qualitative 
predictor variable unfortunately leads to computational difficulties. To see why, suppose 
we have n = 4 observations, the first two being stock firms (for which X2 = 1 and X4 —0), 
and the second two being mutual firms (for which X2 = 0 and Хз = 1). The X matrix would 


then be: 
XX Эң 
1 Xu 1 0 
Nr. x д di 
1 Xa 0 od 


Note that the first column is equal to the sum of the X> and Хз columns, so that the columns 
are linearly dependent according to definition (5.20). This has a serious effect on the X'X 


matrix: 
\ 1 \ 1 1 Xi, 1 0 
wo [Xu Xa Xa Xaj]! Xa | 0 
E EC E 
0 0 \ 1 1 Х O I 
4 
4 УХ 2 2 
i=l 
4i 4 А 2 4 
XXa Ха Xa Xa 
_ tin j=l ic i-i 
= 2 
2 So Xin 2 0 
= 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 315 


We see that the first column of the X'X matrix equals the sum of the last two columns, 
so that the columns are linearly dependent. Hence, the X’X matrix does not have an inverse, 
and no unique estimators of the regression coefficients can be found. 

A simple way out of this difficulty is to drop one of the indicator variables. In our 
example, we might drop Хз. Dropping one indicator variable is not the only way out of the 
difficulty, but it leads to simple interpretations of the parameters. In general, therefore, we 
shall follow the principle: 


A qualitative variable with c classes will be represented by c — 1 


indicator variables, each taking on the values 0 and 1. (8.32) 


Comment РЯ 


Indicator variables are frequently also called dummy variables or binary variables. The latter term 
has reference to the binary number system containing only 0 and 1. ш 


Interpretation of Regression Coefficients 


Returning to the insurance innovation example, suppose that we drop the indicator variable 
Хз from regression model (8.31) so that the model becomes: 


Y; = Po + BiXi + ВХ + £i (8.33) 
where: 


Xj, = size of firm 


X = 1 if stock company 
2° |0 if mutual company 


The response function for this regression model is: 
E{Y} = Bo + BiX1 + £2X» (8.34) 


To understand the meaning of the regression coefficients in this model, consider first the 
case of a mutual firm. For such a firm, X2 = 0 and response function (8.34) becomes: 


E{Y}= Bo + В.Х + В(0) = Во + В.Х Mutual firms (8.34a) 


Thus, the response function for mutual firms is a straight line, with Y intercept Во and slope 
P1. This response function is shown in Figure 8.11. 
For a stock firm, X? = 1 and response function (8.34) becomes: 


E(Y] = Bo + fiXi- В2(1) = (Lo + Bo) + В.Х Stock firms (8.346) 


This also is a straight line, with the same slope £; but with Y intercept Во + 82. This response 
function is also shown in Figure 8.11. — i 

Let us consider now the meaning of the regression coefficients in response function (8.34) 
with specific reference to the insurance innovation example. We see that the mean time 
elapsed before the innovation is adopted, Е {Y}, is a linear function of size of firm (X1), 
with the same slope £i for both types of firms. 62 indicates how much higher (lower) the 
response function for stock firms is than the one for mutual firms, for any given size of firm. 
Thus, £; measures the differential effect of type of firm. In general, 8; shows how much 
higher (lower) the mean response line is for the class coded 1 than the line for the class 
coded 0, for any given level of X4. 


316 PartTwo Multiple Linear Regression 


FIGURE 8.11 
Illustration of 
Meaning of 
Regression 
Coefficients for 
Regression 
Model (8.33) 
with Indicator 
Variable 
X2—Insurance 
Innovation 
Example. 


Example 


Number of 
Months Elapsed 


Stock Firms Response Function: 


ҢҮ} = (Во + 82) + Ву 


Mutual Firms Response Function: 
ҢҮ} = Bo + BiX 


Bo + B2 


Size of Firm 


In the insurance innovation example, the economist studied 10 mutual firms and 10 stock 
firms. The basic data are shown in Table 8.2, columns 1—3. The indicator coding for type 
of firm is shown in column 4. Note that X; = 1 for each stock firm and X2 = О for each 
mutual firm. 

The fitting of regression model (8.33) is now straightforward. Table 8.3 presents the key 
results from a computer run regressing Y on X, and X». The fitted response function is: 


Y = 33.87407 — .10174 X, + 8.05547 X; 


Figure 8.12 contains the fitted response function for each type of firm, together with the 
actual observations. 

The economist was most interested in the effect of type of firm (Xz) on the elapsed time 
for the innovation to be adopted and wished to obtain a 95 percent confidence interval for 
Bo. We require t(.975; 17) = 2.110 and obtain from the results in Table 8.3 the confidence 
limits 8.05547 + 2.110(1.4591 1). The confidence interval for £ therefore is: 


4.98 < В < 11.13 


Thus, with 95 percent confidence, we conclude that stock companies tend to adopt the inno- 
vation somewhere between 5 and 11 months later, on the average, than mutual companies. 
for any given size of firm. 

A formal test of: 


Но: f; = 0 
Ha: В #0 


TABLE 8.2 
Data and 
Indicator 
Coding— 
Insurance 
Innovation 
Example. 


TABLE 8.3 
Regression 
Results for Fit 
of Regression 
Model (8.33)— 
Insurance 
Innovation 
Example. 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 317 


а) (2) (3) (4) (5) 
Number of Size of Firm Indicator 
Firm Months Elapsed (million dollars) Type of Code р 
i 1 Ха Firm Xiz Xn Xi? 
1 17 151 Mutual 0 0 
2 .26 92 Mutual 0 0 
3 21 175 Mutual 0 0 
4 30 31 Mutual 0 0 
5 22 104 ‘Mutual 0 s 0 
6 0 `277 Mutual 0 à 0 
7 12 210 Mutual. 0 А 0 
8 19 120 Миша! 0 0 
9 4 290 Mutual 0 0 
10 16 238 Mutual. 0 0 
11 28 164 Stock 1 164 
12 y 15 272 Stock 1 272 
13 » 11 295 Stock 1 295 
14 38 68 Stock 1 `68 
15 31 85 Stock 1 85 
16 21 224 Stock 1 224 
17- 20 166 Stock 1 166 
18 13 305 Stock 1 305 
19 30 124 Stock 1 124 
20 14 246 Stock 1 246 


‘(a)-Regression С oefficients 


Regression ‘Estimated _ | Éstimated. | 


Coefficient - Regression Coefficient ^ Standard Deviation е 
Bo - 33:87407 1.81386 18.68 
fi 2.10174 .00889 11:44 
Вә 8.05547 1.45911 5.52 


. (р) Analysis of Variance’ 


Source of. 

Variation 55 df | MS 
Regression 150441 . 2 752.20 
Error 176.39 17 10.38 
Total- 1,680.80 19 И 


with level of significance .05 would lead to H,, that type of firm has an effect, since the 
95 percent confidence interval for 62 does not include zero. 
The economist also carried out other analyses, some of which will be described shortly. 


Comment 


The reader may wonder why we did not simply fit separate regressions for stock firms and mutual 
firms in our example, and instead adopted the approach of fitting one regression with an indicator 


318 Part Two Multiple Linear Regression 


FIGURE 8.12 Y 
Fitted | 40 
Regression e Stock Firms Response Function: 
Functions for f = (33.87407 + 8.05547) — .10174X, 
Regression 35 
Model (8.33) — 
Insurance Mutual Firms Response Function: 
À v 30 A 
Innovation Ф Y = 33.87407 — .10174X, 
Example. 8 T 
№ 25 
5 
c 
б 
2 20 
5 
[o] ud 
a E 
E 15 
2 s 
Z 


10 


(0) 50 100 150 200 250 300 X 
Size of Firm 


variable. There are two reasons for this. Since the model assumes equal slopes and the same constant 
error term variance for each type of firm, the common slope f can best be estimated by pooling 
the two types of firms. Also, other inferences, such as for Во and £2, can be made more precisely by 
working with one regression mode] containing an indicator variable since more degrees of freedom 
will then be associated with MSE. a 


Qualitative Predictor with More than Two Classes 


If a qualitative predictor variable has more than two classes, we require additional indicator 
variables in the regression model. Consider the regression of tool wear (Y) on tool speed 
(X1) and tool model, where the latter is a qualitative variable with four classes (M1, M2, 
M3, M4). We therefore require three indicator variables. Let us define them as follows: 


х, [1 iftool model МІ 
27 10 otherwise 


Y= | if tool model M2 (8.35) 


O0 otherwise 


Y= 1 if tool model M3 
4 10 otherwise 


First-Order Model. А first-order regression model is: 


Y; = Bo + £iXn + B2Xi2 + 3Хз + BaXia + £i (8.36) 


Chapter8 Regression Models for Quantitative and Qualitative Predictors 319 


For this model, the data input for the X variables would be as follows: 


Tool Model X1 X; Хз X4 


M1 Xn 1 0 0 
M2 Xn 0 1 0 
M3 Xn 0 0 1 
M4 Xn 0 0 0 


The response function for regression model (8.36) is: 
E(Y] = Bo + В.Х + ВХ + £Xs + BaXa (8.37) 


To understand the meaning of the regression coefficients, consider first what response 
function (8.37) becomes for tool models МА for. which X; = 0, Хз = 0, and X, = 0: 


E(Y) = fo + £&1Xi Tool models M4 (8.37a) 
For too] models M1, X2 = 1, Хз = 0, and X, = 0, and response function (8.37) becomes: 


E{Y} = (Во + 62) + В.Х, Тоо] models MI (8.37b) 
Similarly, response functions (8.37) becomes for tool models M2 and M3: 

E{Y} = (bo + £3) + В.Х, Tool models M2 (8.37с) 

E{Y} = (Bo + Ba) + В.Х. Tool models M3 (8.37d) 


Thus, response function (8.37) implies that the regression of tool wear on tool speed is 
linear, with the same slope for all four tool models. The coefficients £2, 63, and £4 indicate, 
respectively, how much higher (lower) the response functions for tool models M1, M2, and 
МЗ are than the one for tool models M4, for any given level of tool speed. Thus, £5, 63, and 
Вг measure the differential effects of the qualitative variable classes on the height of the 
response function for any given level of Xi, always compared with the class for which X? = 
X3 — X4 — 0. Figure 8.13 illustrates a possible arrangement of the response functions. 

When using regression model (8.36), we may wish to estimate differential effects other 
than against tool models M4. This can be done by estimating differences between regression 
coefficients. For instance, 34 — Вз measures how much higher (lower) the response function 
for tool models M3 is than the response function for tod! models M2 for any given level of 
tool speed, as may be seen by comparing (8.37c) and (8.37d). The point estimator of this 
quantity is, of course, b4 — b3, and the estimated variance of this estimator is: 


s^ (ba — bs] = 800.) + 52063) — 25 (ba, Рз) s (8.38) 
The needed variances and covariance can be readily obtained from the estimated variance- 


covariance matrix of the regression coefficients. 


Time Series Applications 
Economists and business analysts frequently use time series data in regression analysis. 
Indicator variables often are useful for time series regression models. For instance, savings 
(Y) may be regressed on income (X), where both the savings and income data are annual 


320 PartTwo Multiple Linear Regression 


FIGURE 8.13 
Illustration of 
Regression 
Model (8.36) — 
Tool Wear 
Example. 


Tool Wear Tool Models M3: ҢҮ} = (By + B4) + ВХ; 


Tool Models M2: ҢҮ} = (Bo + 83) 3X 


Tool Models М4: E{Y} = Bo + 4X 


Вз 


Tool Models М1: ЕҚ} = (Bo + Bo) + BX 


Tool Speed 


data for a number of years. The model employed might be: 
Y, = Po + В.Х, + ё t1. (8.39) 


where Y, and X, are savings and income, respectively, for time period t. Suppose that the 
period covered includes both peacetime and wartime years, and that this factor should be 
recognized since savings in wartime years tend to be higher. The following model might 
then be appropriate: 


Y, = Po + fiXn + ВХ + & (8.40) 
where: 


Xj = income 
Xo= 1 if period t peacetime 
77710 otherwise 


Note that regression model (8.40) assumes that the marginal propensity to save (£i) 15 
constant in both peacetime and wartime years, and that only the height of the response 
function is affected by this qualitative variable. 

Another use of indicator variables in time series applications occurs when monthly 
or quarterly data are used. Suppose that quarterly sales (Y) are regressed on quarterly 
advertising expenditures (X1) and quarterly disposable personal income (X2). If seasonal 
effects also have an influence on quarterly sales, a first-order regression model inc orporating 


Chapter8 Regression Models for Quantitative and Qualitative Predictors 321 


seasonal effects would be: 


Y, = Bot fiXn + b2X12 + 63X13 + BaXi + Bs X15 + & (8.41) 


where: 


Xj, = quarterly advertising expenditures 
X12 = quarterly disposable personal income 
х. = 1 if first quarter 
3 © \0 otherwise 
х, 1 ifsecond quarter 
^" |0 otherwise 
] ifthird quarter 
Х,5 = : 
О otherwise 
Regression models for time series data are susceptible to correlated error terms. It is 
particularly important in these cases to examine whether the modeling of the time series 
components of the data is adequate to make the error terms uncorrelated. We discuss in 


Chapter 12 a test for correlated error terms and a regression model that is often useful when 
the error terms are correlated. 


8.4 Some Considerations in Using Indicator Variables 


Indicator Variables versus Allocated Codes 


An alternative to the use of indicator variables for a qualitative predictor variable is to em- 
ploy allocated codes. Consider, for instance, the predictor variable “frequency of product 
use" which has three classes: frequent user, occasional user, nonuser. With the allocated 
codes approach, a single X variable is employed and values are assigned to the classes; for 
instance: 


Class Xi 
Frequent user 3 
Occasionaluser' 2 
Nonuser 1 


The allocated codes are, of course, arbitrary and could be other sets of numbers. The first- 
order model with allocated codes for our example, assuming no other predictor variables, 
would be: 


Y; = fo + fiXn + ё (8.42) 


The basic difficulty with allocated codes is that they define a metric for the classes of the 
qualitative variable that may not be reasonable. To see this concretely, consider the mean 


322 PartTwo Multiple Linear Regression 


responses with regression model (8.42) for the three classes of the qualitative variable: 


Class E{Y} 


Frequent user E{Y} = Bo + Bi(3) = Bo + 381 
Occasional user E{Y} = Bo + £1(2) = Во + 281 
Nonuser Е{Ү} = Во + В1(1) = Во + Fi 


Note the key implication: 
E (Y frequent user} — Е{Ү [occasional user} = E(Y loccasional user] — E(Y [ропиѕег) = £, 


Thus, the coding 1, 2, 3 implies that the mean response changes by the same amount when 
going from a nonuser to an occasional user as when going from an occasional user to a 
frequent user. This may not be in accord with reality and is the result of the coding 1, 2, 3, 
which assigns equal distances between the three user classes. Other allocated codes may, of 
course, imply different spacings of the classes of the qualitative variable, but these would 
ordinarily still be arbitrary. 

Indicator variables, in contrast, make no assumptions about the spacing of the classes 
and rely on the data to show the differential effects that occur. If, for the same example, two 
indicator variables, say, X; and X2, are employed to represent the qualitative variable, as 


follows: 
Class Xi X2 
Frequent user 1 0 
Occasional user 0 1 
Nonuser 0 0 


the first-order regression model would be: 
Y; = Po + В.Ха + ВХ t ғ; (8.43) 
Here, В, measures the differential effect: 
E{Y|frequent user} — E{Y|nonuser} 
and f2 measures the differential effect: 
E(Y |occasional user] — E(Y |nonuser] 


Thus, f; measures the differential effect between occasional user and nonuser, and f; — Ё 
measures the differential effect between frequent user and occasional user. Notice that there 
are no arbitrary restrictions to be satisfied by these two differential effects. Also note tbat 
if В = 2f», then equal spacing between the three classes would exist. 


Indicator Variables versus Quantitative Variables 


Indicator variables can be used even if the predictor variable is quantitative. For instance, the 
quantitative variable age may be transformed by grouping ages into classes such as under 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 323 


21, 21—34, 35—49, etc. Indicator variables are then used for the classes of this new predictor 
variable. At first sight, this may seem to be a questionable approach because information 
about the actual ages is thrown away. Furthermore, additional parameters are placed into 
the model, which leads to a reduction of the degrees of freedom associated with MSE. 

Nevertheless, there are occasions when replacement of a quantitative variable by indicator 
variables may be appropriate. Consider a large-scale survey in which the relation between 
liquid assets (Y) and age (X) of head of household is to be studied. Two thousand households 
were included in the study, so that the loss of 10 or 20 degrees of freedom is immaterial. 
The analyst is very much in doubt about the shape of the regression function, which could 
be highly complex, and hence may utilize the indicator variable approach in order to obtain 
information about the shape of the response function without making any шнш about 
its functional form. 

Thus, for large data sets use of indicator variables can serve as an ditate to lowess 
and other nonparametric fits of the response function. i 


Other Codings for Indicator Variables 


As stated earlier, many different codings of indicator variables are possible. We now describe 
-“two alternatives to our 0, 1 coding for c — 1 indicator variables for a qualitative variable 
with c classes. We illustrate these alternative codings for the insurance innovation example, 
where Y is time to adopt an innovation, X is size of insurance firm, and the second predictor 
variable is type of company (stock, mutual). 
The first alternative coding is: 


. | 1 ifstock company 
X = a if mutual company (84) 


For this coding, the first-order linear regression model: 
Y; = Bot Pı Xi + ВХ + ё (8.45) 
has the response function: 
E{Y} = Bo + В.Х; + ВХ (8.46) 
This response function becomes for the two types of companies: 
E{Y} = (Bo + £2) + &iXi Stock firms (8.46a) 
E{Y} = (fo — 82) + &iXi Mutual firms (8.46b) 


Thus, fo here may be viewed as an "average" * intercept of the regression line, from which 
the stock company and mutual company intercepts differ by В: in opposite directions. A test 
whether the regression lines are the same for both types of companies involves Ho: 62 = 0, 
Ha: Bo ж 0. 

A second alternative coding scheme is to use a 0, 1 indicator variable for each of the c 
classes of the qualitative variable and to dróp the intercept term in the regression model. 
For the insurance innovation example, the model would be: 


Y; = Pı Xin + BoXi2 + 3 Хз + €i (8.47) 


324 Part То Multiple Linear Regression 


where: 


Х = size of firm 
Xa 1 if stock company 
2 (0 otherwise 


Xn = 1 if mutual company 
HU (0 otherwise 


Here, the two response functions are: 


E(Y) = Bo + By Xi Stock firms (8.482) 
E(Y) = Bs + BX) Mutual firms me (8.48b) 
A test of whether or not the two regression lines are the same would involve the alternatives 


Ho: Bo = Вз, Ha: Bo Æ Вз. This type of test, discussed in Section 7.3, cannot be conducted 
by using extra sums of squares and requires the fitting ofboth the full and reduced models, 


8.5 Modeling Interactions between Quantitative 
and Qualitative Predictors 


In the insurance innovation example, the economist actually did not begin the analysis with 
regression model (8.33) because of the possibility of interaction effects between size of 
firm and type of firm on the response variable. Even though one of the predictor variables 
in the regression model here is qualitative, interaction effects can still be introduced into 
the model in the usual manner, by including cross-product terms. A first-order regression 
model with an added interaction term for the insurance innovation example is: 


Y; = Bo + Pi Xii + BX + В3Х Хр + 8i (8.49) 
where: 


X; = size of firm 


Xo = 1 if stock company 
2 10 otherwise 


The response function for this regression model is: 
E(Y] = Bo + Bi Xı + BoX2 + BAX,X; (8.50) 


Meaning of Regression Coefficients 


The meaning of the regression coefficients in response function (8.50) can best be understood 
by examining the nature of this function for each type of firm. For а mutual firm, X; = 0 
and hence X, X; = 0. Response function (8.50) therefore becomes for mutual firms: 


E{Y} = Bo + ByXi + Bo) + B3(0) = Bo + В.Х, Mutual firms (8.50а) 


This response function is shown in Figure 8.14. Note that the Y intercept is fq and the slope 
is В, for the response function for mutual firms. 


FIGURE 8.14 
Illustration of 
Meaning of 
Regression 
Coefficients for 
Regression 
Model (8.49) 
with Indicator 
Variable X2 
and Interaction 
Term— 
Insurance 
Innovation 
Example. 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 325 


Number of 
Months Elapsed 


Stock Firms Response Function: 


ҢҮ} = (Во + B2) + (В + B3) 


Mutual Firms Response Function: 


ҢҮ} = Bo + BX 


Bo + В 


0 X 
Size of Firm 


For stock firms, X; = 1 and hence XX; = X1. Response function (8.50) therefore be- 
comes for stock firms: 


E{Y} = Во + В.Х, + B21) + &sXi 
or: 


E{Y} = (Bo + B2) + (Bi + Вз) X1 Stock firms 


This response function is also shown in Figure 8.14. Note that the response function for 
stock firms has Y intercept Во + Вг and slope B1 + Вз. 

We see that 62 here indicates how much greater (smaller) is the Y intercept of the response 
function for the class coded 1 than that for the class coded 0. Similarly, Вз indicates how 
much greater (smaller) is the slope of the response function for the class coded 1 than that 
for the class coded 0. Because both the intercept and the slope differ for the two classes in 
regression model (8.49), it is no longer true that 82 indicates how much higher (lower) one 
response function is than the other for any given level of X,. Figure 8.14 shows that the 
effect of type of firm with regression model (8.49) depends on Xj, the size of the firm. For 
smaller firms, according to Figure 8.14, mutua! firms tend to innovate more quickly, but for 
larger firms stock firms tend to innovate more quickly. Thus, when interaction effects are 
present, the effect of the qualitative predictor variable can be studied only by comparing the 
regression functions within the scope of the model for the different classes of the qualitative 
variable. 

Figure 8.15 illustrates another possible interaction pattern for the insurance innovation 
example. Here, mutual firms tend to introduce the innovation more quickly than stock firms 


(8.50b) 


326 PartTwo Multiple Linear Regression 


FIGURE 8.15 
Another 
Hlustration of 
Regression 
Model (8.49) 
with Indicator 
Variable X; 
and Interaction 
Term— 
Insurance 
Innovation 
Example. 


Example 


Stock Firms 


Mutual Firms 


Number of Months Elapsed 


Size of Firm 


for all sizes of firms in the scope of the model. but the differential effect is much smaller 
for large firms than for small ones. 

The interactions portrayed in Figures 8.14 and8.15 can no longer be viewed as reinforcing 
or interfering types of interactions because one of the predictor variables here is qualitative, 
When one of the predictor variables is qualitative and the other quantitative, nonparallel 
response functions that do not intersect within the scope of the model (as in Figure 8.15) are 
sometimes said to represent an ordinal interaction. When the response functions intersect 
within the scope of the model (as in Figure 8.14), the interaction is then said to be adisordinal 
interaction. 


Since the economist was concerned that interaction effects between size and type of firm 
may be present, the initial regression model fitted was model (8.49): 


Y; = Bo + BiXi + ВХ + Вз Ха Хэ + е; 


The values for the interaction term Х X» for the insurance innovation example are shown 
in Table 8.2, column 5, on page 317. Note that this column contains О for mutual companies 
and X; for stock companies. 

Again, the regression fit is routine. Basic results from a computer run regressing Y on 
X4, Хэ, and X, X» are shown in Table 8.4. To test for the presence of interaction effects: 


Ho: fa = () 

Ay: Вз Ф 0 

the economist used the f* statistic from Table 8.4a: 
bi —.0004171 » 


tr = = 
s{b3} .01833 


TABLE 8.4 
ession 
es for Fit 
of Regression 
Medel (8.49) 
with 
Interaction 
Term— 
Insurance 
Innovation 
Example. 


8.6 More Complex Models | 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 327 


(a) Regression Coefficients 


Estimated | Estimated 
Regression Coefficient Standard Deviation t* 
33.83837 2.44065 13.86 
—.10153 .01305 —7.78 
8.13125 3.65405 2.23 
—.0004171 .01833 —.02 
(b) Analysis of Variance 
Source of к 
Variation SS df MS E 
Regression 1,504.42 3 501.47 i 
Error. 176.38 16 11.02 
Total 1,680.80 19 


` For level of significance .05, we require t(.975; 16) = 2.120. Since |t*| = .02 < 2.120, 


we conclude Ho, that 63 = 0. The conclusion of no interaction effects is supported by the 
two-sided P-value for the test, which is very high, namely, .98. It was because of this result 
that the economist adopted regression model (8.33) with no interaction term, which we 
discussed earlier. 


Comment 


Fitting regression model (8.49) yields the same response functions as would fitting separate regressions 
for stock firms and mutual firms. An advantage of using model (8.49) with an indicator variable is 
that one regression run will yield both fitted regressions. 

Another advantage is that tests for comparing the regression functions for the different classes of 
the qualitative variable can be clearly seen to involve tests of regression coefficients in a general linear 
mode]. For instance, Figure 8.14 for the insurance innovation example shows that a test of whether 
the two regression functions have the same slope involves: 


Ho: Вз = 0 
Ha: рз #0 
Similarly, Figure 8.14 shows that a test of whether the two regression functions are identical involves: 
Ho: P» = Bs E 0 


Ha: not both £; = 0 and f4 = 0 


We now briefly consider more complex models involving quantitative and qualitative 
predictor variables. 


328 PartTwo Multiple Linear Regression 


More than One Qualitative Predictor Variable 


Regression models can readily be constructed for cases where two or more of the predictor : 


variables are qualitative. Consider the regression of advertising expenditures (Y) on Sales 
(X), type of firm (incorporated, not incorporated), and quality of sales management (high 
low). We may define: , 


Y= | if firm incorporated 
27 [0 otherwise 


і if quality of sales management high (8.51) 
О otherwise 


First-Order Model. A first-order regression model for the above example is; 


Y; = Bo + В. Ха + ВХо + Өз Хз + ё; (8.52) } 


This model implies that the response function of advertising expenditures on sales is linear, 
with the same slope for all “type of firm—quality of sales management" combinations, 
and f» and f indicate the additive differential effects of type of firm and quality of sales 
management on the height of the regression line for any given levels of Х and the other 
predictor variable. 


First-Order Model with Certain Interactions Added. A first-order regression model 
to which are added interaction effects between each pair of the predictor variables for the 
advertising example is: 


Y; = Bo + В. Ха + ВХ, + BsXis + ХХ + ХХ + В.Х Хз +e (8.53) : 


Note the implications of this model: 


Type of Quality of Sales 

Firm Management Response Function 
Incorporated High E{Y} = (Bo + Bo + Вз + Bo) + (61 + Ba + 85) 
Not incorporated High ҢҮ} = (Bo + B3) + (А + Bs) Xi 
Incorporated Low E{Y} = (fo + £2) + (Ат + Ba) Хз 

Not incorporated Low Е{Ү} = fo + В. Хт 


Not only are all response functions different for the various “type of firm— quality of sales 
management” combinations, but the differential effects of one qualitative variable on the 
intercept depend on the particular class of the other qualitative variable. For instance, when 
we move from “not incorporated—low quality" to *incorporated—low quality,” the intercept 
changes by £2. But if we move from "not incorporated—high quality" to “incorporated— 
high quality,” the intercept changes by B2 + B. 


=. 


Demre 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 329 


Qualitative Predictor Variables Only 
Regression models containing only qualitative predictor variables can also be constructed. 
With reference to our advertising example, we could regress advertising expenditures only 
on type of firm and quality of sales management. The first-order regression model then 
would be: 


Y; = Bo + £2Xi + BsXis + €i (8.54) 
where Ху; and Х;з are defined in (8.51). 


Comments | 

1. Models in which all explanatory variables are qualitative are called analysis of variance 
models. 

2. Models containing some quantitative and some qualitative explanatory variables, where the 
chief explanatory variables of interest are qualitative and the quantitative variables are introduced 
primarily to reduce the variance of the error terms, are called analysis of covariance models. 


8.7 Comparison of Two or More Regression Functions 


Frequently we encounter regressions for two or more populations and wish to study their 
similarities and differences. We present three examples. 


1. A company operates two production lines for making soap bars. For each line, the 
relation between the speed of the line and the amount of scrap for the day was studied. 
A scatter plot of the data for the two production lines suggests that the regression relation 
between production line speed and amount of scrap is linear but not the same for the two 
production lines. The slopes appear to be about the same, but the heights of the regression 
lines seem to differ. A formal test is desired to determine whether or not the two regres- 
sion lines are identical. If it is found that the two regression lines are not the same, an 
investigation is to be made of why the difference in scrap yield exists. 


2. Aneconomist is studying the relation between amount of savings and level of income 
for middle-income families from urban and rural areas, based on independent samples from 
the two populations. Each of the two relations can be modeled by linear regression. The 
economist wishes to compare whether, at given income levels, urban and rural families 
tend to save the same amount—i.e., whether the two regression lines are the same. If they 
are not, the economist wishes to explore whether at least the amounts of savings out of an 
additional dollar of income are the same for the two groups—i.e., whether the slopes of the 
two regression lines are the same. 


3. Twoinstruments were constructed for a company to identical specifications to measure 
pressure in an industrial process. A study was then made for each instrument of the relation 
between its gauge readings and actual pressures as determined by an almost exact but slow 
and costly method. If the two regression lines are the same, a single calibration schedule 
can be developed for the two instruments; otherwise, two different calibration schedules 
will be required. 


330 PartTwo Multiple Linear Regression 


When it is reasonable to assume that the error term variances in the regression models 
for the different populations are equal, we can use indicator variables to test the equali 
of the different regression functions. If the error variances are not equal, transformations of 
the response variable may equalize them at least approximately. 

We have already seen how regression models with indicator variables that contain inter. 
action terms permit testing of the equality of regression functions for the different classes 
of a qualitative variable. This methodology can be used directly for testing the equality of 
regression functions for different populations. We simply consider the different populations 
as classes of a predictor variable, define indicator variables for the different populations, ang 
develop a regression model containing appropriate interaction terms. Since no new princi. 
ples arise in the testing of the equality of regression functions for different populations, we 
immediately proceed with two of the earlier examples to illustrate the approach. 


Soap Production Lines Example 


TABLE 8.5 
Data—Soap 
Production 
Lines Example 
(all data are 
coded). 


The data on amount of scrap (Y) and line speed (X1) for the soap production lines example 
are presented in Table 8.5. The variable X? is a code for the production line. A symbolic 
scatter plot of the data, using different symbols for the two production lines, is shown in 
Figure 8.16. 1 


Tentative Model. Оп the basis of the symbolic scatter plot in Figure 8.16, the analyst 
decided to tentatively fit regression model (8.49). This model assumes that the regression 
relation between amount of scrap and line speed is linear for both production lines and that 
the variances of the error terms are the same, but permits the two regression lines to have 
different slopes and intercepts: 


Y; = Bo + BiXi + BoXi2 + ВзХ Xi» + £i (8.55) 
Production Line 1 Production Line2 
Amount Line Amount: Line. 
Case ofScrap Speed Case ofScrap Speed . 
i aA Xn Xiz i Y; Xn Xp 
1 218 100 1. 16 140 105 0 
2 248 125 1 17 277 215 0: 
3 360 220 1 18 384 270 0. 
4 351 205 1 19 341 255 0 
5 470 300 1 20 215 175 0 
6 394 255 1 21 180 135 0 
7 332 225 1 22 260 200 0 
8 321 175 1 23 361 275 0 
9 410 270 1 24 252 155 0: 
10 260 170 1 25 422 320 0 
11 241 155 1 26 273 190 0 
12 331 190 1 27 410 295 0 
13 275 140 1 
14 425 290 1 
15 367 265 1 


FIGURE 8.16 
Scatter 
piot—Soap 
production 


Lines Example. 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 331 


Amount of Scrap (coded) 


o ‘400 150 200 250 300 350 X 
Line Speed (coded) i 


where: 


Ха = line speed 

X»- t if production line 1 
2 Q0 if production line 2 
{= 1,2,...,27 


Note that for purposes of this model, the 15 cases for production line 1 and the 12 cases for 
production line 2 are combined into one group of 27 cases. 


Diagnostics. A fit of regression model (8.55) to the data in Table 8.5 led to the results 
presented in Table 8.6 and the following fitted regression function: 


Ў = 7.57 + 1.322X, + 90.39X2 — .1767 X1 X5 


Plots of the residuals against Ў are shown in Figure 8.17 for each production line. Two plots 
are used in order to facilitate the diagnosis of possible differences between the two produc- 
tion lines. Both plots in Figure 8.17 are reasonably consistent with regression model (8.55). 
The splits between positive and negative residuals of 10 to 5 for production line 1 and 4 to 
8 for production line 2 can be accounted for by randomness of the outcomes. Plots of the 
residuals against X? and a normal probability plot of the residuals (not shown) also support 
the appropriateness of the fitted. model. For the latter plot, the coefficient of correlation 
between the ordered residuals and their expected values under normality is .990. This is 
sufficiently high according to Table B.6 to support the assumption of normality of the error 
terms. А > 

Finally, the analyst desired to make a formal test of the equality of the variances of 
the error terms for the two production lines, using the Brown-Forsythe test described in 
Section 3.6. Separate linear regression models were fitted to the data for the two production 
lines, the residuals were obtained, and the absolute deviations дд and dj, in (3.8) of the 


332 PartTwo Multiple Linear Regression 


TABLE 8.6 
Regression 
Results for Fit 
of Regression 
Model (8.55) — 
Soap 
Production 


Lines Example. 


FIGURE 8.17 
Residual Plots 
against 

Ӯ? Soap 
Production 
Lines Example. 


(a) Regression Coefficients 


Estimated 
Regression Regression 


Coefficient ^ Coefficient 
Во 7.57 
n 1.322 
В 90.39 
Вз —.1767 


Estimated 
Standard 
Deviation 


.20.87 


.09262, 
28.35 
1288 


(b) Analysis of Variance 


Source of 
Varlation $$ 
Regression 169,165 
X 149,661 
X2lX; 18,694 
X1 X2| X1, X2. 810: 
Error 9,904 
Total 179,069 


(a) Production Line 1 


Residual 


residuals around the median residual for each 


'The results were as follows: 


Production Line 1 


Y —97.965 + 1.145X, 


di =16.132 
Уа — di? = 2,952.20 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 333 


The pooled variance s? in (3.92) therefore is: 
2... 2,952.20 + 2,045.82 


— 199.921 
772 99.9 
Hence, ће pooled standard deviation 15 s = 14.139, and the test statistic in (3.9) is: 
16.132 — 12.648 
tpr = 1 г .636 
1 —+— 
14.139 15 + 15 


For à = .05, we require t(.975; 25) = 2.060. Since | tg; | = -636 < 2.060, we conclude 
that the error term variances for the two production lines do not differ. The two-sided 
P-value for this test is .53. Й 

At this point, the analyst was satisfied about the аріпеѕѕ of regression model (8.55) 
with normal error terms and was ready to proceed with comparing the regression relation 
between amount of scrap and line speed for the two production lines. 


Inferences about Two Regression Lines. Identity of the regression functions for the two 
production lines is tested by considering the alternatives: 


Ho: f = Вз = 0 


(8.56) 
Ha: not both £; = 0 and Вз = 0 
The appropriate test statistic is given by (7.27): 
SSR(X5, X1X2|X SSE(X,, X2, Х.Х. 
pt (X5, X1X2|X1) Zi (X1, X2, X1X2) (8.562) 


2 n—4 
where л represents the combined sample size for both populations. Using the regression 
results in Table 8.6, we find: 


SSR(X2, X1 X2|X1) = SSR(X5|X1) + SSR(X;X2|X1, X2) 


= 18,694 + 810 = 19,504 
19,504 9,904 
Е* = + = 22.65 
2 23 

To control о at level .01, we require F(.99; 2, 23) = 5.67. Since F* = 22.65 > 5.67, we 
conclude H,, that the regression functions for the two production lines are not identical. 

Next, the analyst examined whether the slopes of the regression lines are the same. The 
alternatives here are: 


Ho: Вз = 0 
| He: Bs #0 e 
and the appropriate test statistic is either the г* statistic (7.25) or the partial F test statis- 
tic (7.24): s 
SSR(X; X2|X1, X SSE(X,, X2, XiX 
F= ыы 2) шы. (8.57а) 
1 n—4 
Using the regression results in Table 8.6 and the partial F test statistic, we obtain: 
8 E] 
Е* = SH шш = 1.88 


1 23 


334 PartTwo Multiple Linear Regression 


For œ = .01, we require F(.99; 1, 23) = 7.88. Since F* = 1.88 < 7.88, we conclude Н, 
that the slopes of the regression functions for the two production lines are the same, : 

Using the Bonferroni inequality (4.2), the analyst can therefore conclude at family sig. 
nificance level .02 that a given increase in line speed leads to the same amount of increase 
in expected scrap in each of the two production lines, but that the expected amount of sc. 
for any given line speed differs by a constant amount for the two production lines. 

We can estimate this constant difference in the regression lines by obtaining a confidence 
interval for 5. For a 95 percent confidence interval, we require 1 (.975; 23) = 2.069. Usin 
the results in Table 8.6, we obtain the confidence limits 90.39 + 2.069(28.35). Hence, the 
confidence interval for f is: 


31.7 € fo < 149.0 
We thus conclude, with 95 percent confidence, that the mean amount of scrap for production 


line 1, at any given line speed, exceeds that for production line 2 by somewhere between 
32 and 149. 


ә 


Instrument Calibration Study Example 


The engineer making the calibration study believed that the regression functions relat. 
ing gauge reading (Y) to actual pressure (X4) for both instruments are second-order 
polynomials: 


E{Y} = fo + В.Х + ВХ? 


but that they might differ for the two instruments. Hence, the model employed (using a 
centered variable for X, to reduce multicollinearity problems—see Section 8.1) was: 


Y; = Bo + Bixi + Box}, + ВХ + Ваха Хо + В5х2 Хо Ч е, (8.58) 
where: 


ха = Ха = X, = centered actual pressure 


tee 1 if instrument B 
1> O otherwise 


Note that for instrument A, where X» = 0, the response function is: 
E(Y) = Bo + Bixi + fax; Instrument A — (8.592) 
and for instrument B, where X» — 1, the response function is: 
E{Y} = (Bo + Bs) + (Bi + Ba)xı + (B2 + Bs)x; Instrument В (8.59b) 
Hence, the test for equality of the two response functions involves the alternatives: 
Ho: Вз = Bs = Bs = 0 
Fa a F : 4 equal zero in 
and the appropriate test statistic is (7.27): 


XO SSR(Xs,xiXos.xiXapa, x7) | SSE(x1. xt. Хэ, х X2, ХХ) 
= 3 i п—6 


where n represents the combined sample size for both populations. 


F* 


(8.602) 


Cited 


Reference 


Problems 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 335 


Comments 


1. 


The approach just described is completely general. If three or more populations are involved, 


additional indicator variables are simply added to the model. 


2. 


The use of indicator variables for testing whether two or more regression functions are the 


same is equivalent to the general linear test approach where fitting the full model involves fitting 
separate regressions to the data from each population, and fitting the reduced model involves fitting 


one regression to the combined data. 


8.1. Draper, N. R., and H. Smith. Applied Regression Analysis. 3rd ed. New York: John Wiley & 


8.1. 


82. 


8.3. 


*8.4. 


*8.5. 


Sons, 1998. 


ue 


Prepare à contour plot for the quadratic response surface E(Y) = 140 + 4x2 — 2xÀ4- 5х2. 
Describe the shape of the response surface. 

Prepare a contour plot for the quadratic response surface E{Y} = 124 — 3x? — 2x2 — 6x1xs. 
Describe the shape of the response surface. 

A junior investment analyst used a polynomial regression model of relatively high order in à 
research seminar on municipal bonds and obtained an R? of .991 in the regression of net interest 
yield of bond (Y) on industrial diversity index of municipality (X) for seven bond issues. A 
classmate, unimpressed, said: “You overfitted. Your curve follows the random effects in the 
data.” 

a. Comment on the criticism. 

b. Might R? defined in (6.42) be more appropriate than R? as a descriptive measure here?" 


Refer to Muscle mass Problem 1.27. Second-order regression model (8.2) with independent 
normal error terms is expected to be appropriate. 


a. Fitregression model (8.2). Plot the fitted regression function and the data. Does the quadratic 
regression function appear to be a good fit here? Find R?. 

b. Test whether or not there is a regression relation; use œ = .05. State the alternatives, decision 
rule, and conclusion. 

c. -Estimate the mean muscle mass for women aged 48 years; use a 95 percent confidence 
interval. Interpret your interval. 

d. Predict the muscle mass for а woman whose age is 48 years; use a 95 percent prediction 
interval. Interpret your interval. 

e. Test whether the quadratic term can be dropped from the regression model; use œ = .05. 
State the alternatives, decision rule, and conclusion. 

f. Express the fitted regression function obtained in part (a) in terms of the original variable X. 

g. Calculate the coefficient of simple correlation between X and X? and between x and x?. Is 
the use of a centered variable helpful here? 


Refer to Muscle mass Problems 1.27 апа 8.4. ^ 

а. Obtain the residuals from the fit in 8.4a and plot them against Ў and against x on separate 
graphs. Also prepare a normal probability plot. Interpret your plots. 

b. Test formally for lack of fit of the quadratic regression function; use o = .05. State the 
alternatives, decision rule, and conclusion. What assumptions did you make implicitly in 
this test? 


336 Part То Multiple Linear Regression 


8.6. 


8.7. 


8.8. 


8.9. 


c. Fitthird-order model (8.6) and test whether or not В = 0: usea = .05. State the: alternatives. 
decision rule, and conclusion. Is your conclusion consistent with your finding in part (by? 


Steroid level. An endocrinologist was interested in exploring the relationship between the level 
of a steroid (У) and age (X) in healthy female subjects whose ages ranged from 8 to 25 years, 
She collected a sample of 27 healthy females in this age range. The Чага are given below. 


i: 1 2 3 кез 25 26 27 
Xi: 23 19 25 es 13 14 18 
Yi: 27.1 22.1 21.9 ers 12.8 20.8 20.6 


a. Fitregression model (8.2). Plotthe fitted regression function and the data. Does the quadratic 
regression function appear to be a good fit here? Find R*. E 

b. Test whether or not there is a regression relation; use @ = .01. State the alternatives, decision 
rule, and conclusion. What is the P-value of the test? 

c. Obtain joint interval estimates for the mean steroid level of females aged 10, 15, and 20, 
respectively. Use the most efficient simultaneous estimation procedure and a 99 percent 
family confidence coefficient. Interpret your intervals. 

d. Predict the steroid levels of females aged 15 using a 99 percent prediction interval. Interpret 
your interval. 

e. Test whether the quadratic term can be dropped from the model; use œ = .01. State the 
alternatives, decision rule, and conclusion. 

f. Express the fitted regression function obtained in part (a) in terms of the original variable X, 

Refer to Steroid level Problem 8.6. 


a. Obtain the residuals and plot them against the fitted values and against x on separate graphs. 
Also prepare a norma! probability plot. What do your plots show? 

b. Test formally for lack of fit. Control the risk of a Type I error at .01. State the alternatives, 
decision rule, and conclusion. What assumptions did you make implicitly in this test? 


Refer to Commercial properties Problems 6.18 and 7.7. The vacancy rate predictor (Хз) does 
not appear to be needed when property age (X1), operating expenses and taxes (X2). and total 
square footage (X4) are included in the model as predictors of rental rates (Y). 


a. The age of the property (X4) appears to exhibit some curvature when plotted against the 
rental rates (Y). Fit a polynomial regression model with centered property age (xj) 
the square of centered property age (x?), operating expenses and taxes (X2), and total 
square footage (X4). Plot the Y observations against the fitted values. Does the response 
function provide a good fit? 

b. Calculate R2, What information does this measure provide? 

c. Test whether or not the the square of centered property age (x?) can be dropped from the 
model; use = .05. State the alternatives, decision rule. and conclusion. What is the P-value 
of the test? 

d, Estimate the mean rental rate when X, — 8, X5 — 16, and X4 — 250,000: use a 95 percent 
confidence interval. Interpret your interval. 


е. Express the fitted response function obtained in part (a) in the original X variables. 
Consider the response function E[Y) = 25 + 3X1 4- 4X5 + 1.5X4 Xo. 


a. Prepare a conditional effects plot of the response function against X, when X» = 3 and 
when X; — 6. How is the interaction effect of X, and X» on Y apparent from this graph? 
Describe the nature of the interaction effect. 


8.10. 


8.11. 


8.12. 


8.13. 


8.14. 


8.15. 


8.16. 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 337 


b. Plot a set of contour curves for the response surface. How is the interaction effect of X, and 
Х on Y apparent from this graph? 


Consider the response function E{Y} = 14 -- 7X; + 5X2 — 4X4 X3. 


a. Prepare a conditional effects plot of the response function against X; when X, = 1 and when 
X, —4. How does the graph indicate that the effects of X, and X; on Y are not additive? 
What is the nature of the interaction effect? 

b. Plot a set of contour curves for the response surface. How does the graph indicate that the 
effects of X, and X2 on Y are not additive? 


Refer to Brand preference Problem 6.5. 
а. Fit regression model (8.22). 


b. Test whether or not the interaction term can be dropped from the.model; use a = .05. State 
the alternatives, decision rule, and conclusion. 


A student who used a regression model that included indicator variables was upset when 
receiving only the following output on the multiple regression printout: XTRANSPOSE X 
SINGULAR. What is a likely source of the difficulty? 

Refer to regression model (8.33). Portray graphically the response curves for this model if 
Bo = 25.3, В, = .20, апа By = —12.1. 

In aregression study of factors affecting learning time for a certain task (measured in minutes), 
gender of learner was included as a predictor variable (X2) that was coded X; = 1 if male and 
О if female. It was found that b; = 22.3 and s(b;] = 3.8. An observer questioned whether the 
coding scheme for gender is fair because it results in a positive coefficient, leading to longer 
learning times for males than females. Comment. 

Refer to Copier maintenance Problem 1.20. The users of the copiers are either training in- 
stitutions that use a small model, or business firms that use a large, commercial model. An 
analyst at Tri-City wishes to fit a regression model including both number of copiers serviced 
(X1) and type of copier (X2) as predictor variables and estimate the effect of copier model 
(S—small, L—large) on number of minutes spent on the service call. Records show that the 
models serviced in the 45 calls were: 


Assume that regression model (8.33) is appropriate, and let X; — 1 if small model and 0 if large, 

commercial model. 

a. Explain the meaning of all regression coefficients in the model. 

b. Fit the regression model and state the estimated regression function. 

c. Estimate the effect of copier model on mean service time with a 95 percent confidence 
interval. Interpret your interval estimate. 

d. Why would the analyst wish to include X;, number of copiers, in the regression model when 
interest is in estimating the effect of type of copier model on service time? 

e. Obtain the residuals and plot them against Х' X». Is there any indication that an interaction 
term in the regression model would be helpful? 


Refer to Grade point average Problem 1.19. An assistant to the director of admissions con- 
jectured that the predictive power of the model could be improved by adding information on 
whether the student had chosen a major field of concentration at the time the application was 
submitted. Assume that regression model (8.33) is appropriate, where X is entrance test score 


338 PartTwo Multiple Linear Regression 


8.17. 


8.18. 


*8.19. 


8.20. 


821. 


8.22. 


and X» = | if student had indicated a major field of concentration at the time of application 
and 0 if the major field was undecided. Data for X» were as follows: 


i: 1 2 3 E 118 119 120 
Xiz: 0 1 0 L? 1 1 0 


a. Explain how each regression coefficient in model (8.33) is interpreted here. 
b. Fit the regression model and state the estimated regression function. 


c. Test whether the X» variable can be dropped from the regression model; use о = .0]. State 
the alternatives, decision rule, and conclusion. 


d. Obtain the residuals for regression model (8.33) and plot them against XX». Is there any 


evidence in your plot that it would be helpful to include an interaction term in the modep? 
Refer to regression models (8.33) and (8.49). Would the conclusion that £; = 0 have the same 
implication for each of these models? Explain. 


Refer to regression model (8.49). Portray graphically the response curves for this model if 
fa = 25, В, = .30, B» 12.5, and В; = .05. Describe the nature of the interaction effect. 


Refer to Copier maintenance Problems 1.20 and 8.15. 


a. Fit regression model (8.49) and state the estimated regression function. 


b. Test whether the interaction term can be dropped from the model; control the o risk at 10, 
State the alternatives, decision rule, and conclusion. What is the P-value of the test? If the 
interaction term cannot be dropped from the model. describe the nature of the interaction 
effect. 


Refer to Grade point average Problems 1.19 and 8.16. 
a. Fit regression model (8.49) and state the estimated regression function. 


b. Test whether the interaction term can be dropped from the model; use о = .05. State the 
alternatives, decision rule, and conclusion. If the interaction term cannot be dropped from 
the model, describe the nature of the interaction effect. 


In a regression analysis of on-the-job head injuries of warehouse laborers caused by falling 
objects, Y is a measure of severity of the injury, X, is an index reflecting both the weight of 
the object and the distance it fell, and X; and Хз are indicator variables for nature of head 
protection worn at the time of the accident, coded as follows: 


Type of Protection X2 X3 


Hard hat 1 0 
Bump cap 0 1 
None 0 0 


The response function to be used in the study is E(Y] = Bo + В.Х. + ВХ + B: X3. 

a. Develop the response function for each type of protection category. 

b. For each of the following questions, specify the alternatives Hy and Ha for the appropriate 
test: (1) With X, fixed, does wearing a bump cap reduce the expected severity of injury a 
compared with wearing no protection? (2) With X, fixed, is the expected severity of injury 
the same when wearing a hard hat as when wearing a bump cap? 

Refer to tool wear regression model (8.36). Suppose the indicator variables had been defined a5 

follows: X5 = | if tool model M2 and 0 otherwise, Хз = 1 if tool model M3 and 0 otherwise, 

Ху = Viftool model МА and 0 otherwise. Indicate the meaning of each of the following: DA 


(2) В; — Вз. (3) В. 


8.23. 


8.24. 


8.25. 


8.26. 


8.27. 


8.28. 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 339 


А marketing research trainee in the national office of a chain of shoe stores used the following 
response function to study seasonal (winter, spring, summer, fall) effects on sales of a certain 
line of shoes: E{Y} = Bo + By X1 + 82X2 + f3X5. The Xs are indicator variables defined as 
follows: X, = 1 if winter andO otherwise, X; = 1 if spring and О otherwise, Хз = 1 if fall and O 
otherwise. After fitting the model, the trainee tested the regression coefficients Ву (k = 0, ..., 3) 
and came to the following set of conclusions at an .05 family level of significance: Во # 0, 
Pi = 0, Bo Æ 0, Вз Æ О. In the report the trainee then wrote: “Results of regression analysis 
show that climatic and other seasonal factors have no influence in determining sales of this 
shoe line in the winter. Seasonal influences do exist in the other seasons.” Do you agree with 
this interpretation of the test results? Discuss. 

Assessed valuations. A tax consultant studied the current relation between selling price and 
assessed valuation of one-family residential dwellings in a large taX district by obtaining data 
for a random sample of 16 recent “arm’s-length” sales transactions of one-family dwellings 
located on corner lots and for a random sample of 48 recent sales of one-family dwellings not 
located on corper lots. In the data that follow, both selling price (Y) and assessed. valuation 
(X) are expressed in thousand dollars, whereas lot location (X2) is coded 1 for corner lots 
and O for non-corner lots. 


i: 1 2 3 э» 62 63 64 
Ха: 76.4 74.3 69.6 РСЕ 79.4 74.7 71.5 
Xiz: 0 0 0 s 0 0 1 

Yi: 78.8 73.8 64.6 vo 97.6 84.4 70.5 


Assume that the error variances in the two populations are equal and that regression model (8.49) 

is appropriate. 

a. Plotthe sample data for the two populations as a symbolic scatter plot. Does the regression 
relation appear to be the same for the two populations? 

b. Test for identity of the regression functions for dwellings on corner lots and dwellings in 
other locations; control the risk of Type I error at .05. State the alternatives, decision rule, 
and conclusion. 

c. Plot the estimated regression functions for the two populations and describe the nature of 
the differences between them. 


Refer to Grocery retailer Problems 6.9 and 7.4. 


a. Fit regression model (8.58) using the number of cases shipped (Х|) and the binary variable 
(X3) as predictors. 

b. Test whether or not the interaction terms and the quadratic term can be dropped from the 
model; use œ = .05. State the alternatives, decision rule, and conclusion. What is the P-value 
of the test? 


In time series analysis, the X variable representing time usually is defined to take on values 
1, 2, etc., for the successive time periods. Does this represent an allocated code when the time 
periods are actually 1989, 1990, etc.? 

An analyst wishes to include number of older siblings in family as a predictor variable in a re- 
gression analysis of factors affecting maturation in eighth graders. The number of older siblings 
in the sample observations ranges from 0 to 4. Discuss whether this variable should be placed 
in the model as an ordinary quantitative variable or by means of four 0, 1 indicator variables. 
Refer to regression model (8.31) for the insurance innovation study. Suppose fo were dropped 
from the model to eliminate the linear dependence in the X matrix so that the model becomes 
Y; = {Ха + ВХ + Вз Хз + €j. What is the meaning here of each of the regression coeffi- 
cients 81, £5, and £3? 


340 PartTwo Multiple Linear Regression 


Exercises 


8.29. 


8.30. 


831. 


8.32. 


8.33. 


8.34. 


8.35. 


Consider the second-order regression model with one predictor variable in (8.2) and the fol 
lowing two sets of X values: 


Set 1: 10 15 1.1 1.3 1.9 8 12 14 
Set2: 12 | 123 17 415 71 283 38 


For each set, calculate the coefficient of correlation between X and Х?, then between x and x2 
Also calculate the coefficients of correlation between X and X* and between x and x3. What 
generalizations are suggested by your results? 

(Calculus needed.) Refer to second-order response function (8.3). Explain precisely the meaning 

of the linear effect coefficient В and the quadratic effect coefficient fi. 

a. Derive the expressions for bo, bj, and b^, in (8.12). "d 

b. Using (5.46). obtain the variance-covariance matrix for the regression coefficients pertaining 
to the original X variable in terms of the variance-covariance matrix for the regression 
coefficients pertaining to the transformed x variable. 

How are the normal equations (8.4) simplified if the X values are equally spaced, such as the 

time series representation X, = |, Хэ —2,..., X, = n? 

Refer to the instrument calibration study example in Section 8.7. Suppose that three instruments 

(A, B, C) had been developed to identical specifications, that the regression functions relating 

gauge reading (Y) to actual pressure (X1) are second-order polynomials for each instrument, 

that the error variances are the same, and that the polynomial coefficients may differ from one 
instrument to the next. Let Хз denote a second indicator variable, where Хз = 1 if instrument 

C and 0 otherwise. 

a. Expand regression model (8.58) to cover this situation. 

b. State the alternatives, define the test statistic, and give the decision rule for each of the 
following tests when the level of significance is .01: (1) test whether the second-order re- 
gression functions for the three instruments are identical, (2) test whether all three regression 
functions have the same intercept, (3) test whether both the linear and quadratic effects are 
the same in all three regression functions. 


In a regression study, three types of banks were involved, namely, commercial, mutual savings, 
and savings and loan. Consider the following system of indicator variables for type of bank: 


Type of Bank X? X3 
Commercial 1 0 
Mutual savings 0 

Savings and loan -1 -1 


a. Develop a first-order linear regression model for relating last year's profit or loss (Y) to size 
of bank (X4) and type of bank (Хэ, Хз). 


b. State the response functions for the three types of banks. 
c. Interpret each of the following quantities: (1) £2, (2) Ёз. (3) — Bo — f. 
Refer to regression model (8.54) and exclude variable X3. 


a. Obtain the X'X matrix for this special case of a single qualitative predictor variable, for 
i= V,.... n when n, firms are not incorporated. 


b. Using (6.25), find b. 
c. Using (6.35) and (6.36), find SSE and SSR. 


Proj ects 


8.36. 


8.37. 


8.38. 


8.39. 


8.40. 


Chapter 8 Regression Models for Quantitative and Qualitative Predictors 341 


Refer to the CDI data set in Appendix C.2. It is desired to fit second-order regression model (8.2) 
for relating number of active physicians (Y) to total population (X). 


a. Fit the second-order regression model. Plot the residuals against the fitted values. How well 
does the second-order model appear to fit the data? 

b. Obtain R? for the second-order regression model. Also obtain the coefficient of simple 
determination for the first-order regression model. Has the addition of the quadratic term in 
the regression model substantially increased the coefficient of determination? 

с. Test whether the quadratic term can be dropped from the regression model; use a = .05. 
State the alternatives, decision rule, and conclusion. 


Refer to the CDI data set in Appendix C.2. A regression model relating serious crime rate 
(Y, total serious crimes divided by total population) to population density (X, total population 
divided by land area) and unemployment rate (X3) is to be constructed. 


a. Fit second-order regression model (8.8). Plot the residuals against the fitted values. How 
well does the second-order model appear to fit the data? What is R?? 

b. Test whether or not all quadratic and interaction terms can be dropped from the regression 
model; use o = .01. State the alternatives, decision rule, and conclusion. 

c. Instead of the predictor variable population density, total population (Х|) and land area 
(X2) are to be employed as separate predictor variables, in addition to unemployment rate 
(Хз). The regression model should contain linear and quadratic terms for total population, 
and linear terms only for land area and unemployment rate. (No interaction terms are to be 
included in this model.) Fit this regression model and obtain R?. Is this coefficient of multiple 
determination substantially different from the one for the regression model in part (a)? 


Refer to the SENIC data set in Appendix C.1. Second-order regression model (8.2) is to be 
fitted for relating number of nurses (Y) to available facilities and services (X). 


a. Fit the second-order regression model. Plot the residuals against the fitted values. How well 
does the second-order model appear to fit the data? 

b. Obtain R? for the second-order regression model. Also obtain the coefficient of simple 
determination for the first-order regression model. Has the addition of the quadratic term in 
the regression model substantially increased the coefficient of determination? 

c. Test whether the quadratic term can be dropped from the regression model; use о = .01. 
State the alternatives, decision rule, and conclusion. 

Refer to the CDI data set in Appendix C.2. The number of active physicians (Y) is to be 

regressed against total population (X;), tota! personal income (X5), and geographic region 

(X5, Xa, X5). 

а. Fit a first-order regression model. Let Хз = 1 if NE and 0 otherwise, X4 = 1 if NC and 0 
otherwise, and X5 = 1 if S and 0 otherwise. 

b. Examine whether the effect for the, northeastern region on number of active physicians 
differs from the effect for the north central region by constructing an appropriate 90 percent 
confidence interval. Interpret your interval estimate. 

c. Test whether any geographic effects are present;'use œ = .10. State the alternatives, decision 
rule, and conclusion. What is the P-value of the test? 

Refer to the SENIC data set in Appendix C.1. Infection risk (Y ) is to be regressed against length 

of stay (X), age (X2), routine chest X^ray ratio (Хз), and medical school affiliation (Ха). 

a. Fit a first-order regression model. Let X4 — 1 if hospital has medical school affiliation and 
O if not. 


342 PartTwo Multiple Linear Regression 


8.41. 


8.42. 


b. Estimate the effect of medical school affiliation on infection risk using а 98 percept confi. 
dence interval. Interpret your interval estimate. 

c. Ithas been suggested that the effect of medical school affiliation on infection risk may interact 
with the effects of age and routine chest X-ray ratio. Add appropriate interaction terms to 
the regression model, fit the revised regression model, and test whether the interaction terms 
are helpful: use œ = .10. State the alternatives, decision rule, and conclusion. 


Refer to the SENIC data set in Appendix С.1. Length of stay (Y) is to be regressed on age 
(X4), routine culturing ratio (Хэ), average daily census (Хз). available facilities and services 
(X4), and region (X5. Xe, Ху). 


a. Fit a first-order regression model. Let X5 = ! if NE and 0 otherwise, Xe = | if NC ang " 
otherwise, and X; = | if S and О otherwise. " 


b. Test whether the routine culturing ratio can be dropped from the ‘model; use a level of 
significance of .05. State the alternatives, decision rule, and conclusion. 

c. Examine whether the effect on length of stay for hospitals located in the western region differs 
from that for hospitals located in the other three regions by constructing an appropriate 
confidence interval for each pairwise comparison. Use the Bonferroni procedure with a 
95 percent family confidence coefficient. Summarize your findings. 


Refer to Market share data set in Appendix C.3. Company executives want to be able to 

predict market share of their product (У) based on merchandise price (Х|), the gross Nielsen 

rating points (X2, an index of the amount of advertising exposure that the product received), 
the presence or absence of a wholesale pricing discount (X3 — ! if discount present: otherwise 

X3 = 0); the presence or absence ofa package promotion during the period ( X = ! if promotion 

present; otherwise X4 = 0); and year (X5). Code year as a nominal level variable and use 2000 

as the referent year. 

a. Fita first-order regression model. Plot the residuals against the fitted values. How well does 
the first-order model appear to fit the data? 

b. Re-fitthe model in part (a). after adding all second-order terms involving only the quantitative 
predictors. Test whether or not al! quadratic and interaction terms can be dropped from the 
regression model: use о = .05. State the alternatives. decision rule, and conclusion. 

c. In part (a), test whether advertising index (Хз) and year (X5) can be dropped from the model; 
use о = .05. State the alternatives, decision rule, and conclusion. 


Case 
Study 


8.43. 


Refer to University admissions data set in Appendix C.4. The director of admissions at a state 
university wished to determine how accurately students’ grade-point averages at the end of their 
freshman year (Y ) can be predicted from the entrance examination (ACT) test score (Хэ): the 
high school class rank ( Xi, a percentile where 99 indicates student is at or near the top of his 
or her class and { indicates student is at or near the bottom of the class); and the academic year 
(Хз). The academic year variable covers the years 1996 through 2000. Develop a prediction 
model for the director of admissions. Justify your choice of model. Assess your model's ability 
to predict and discuss its use as a tool for admissions decisions. 


Chapter 4 JB 


Building the Regression 
Model I: Model Selection 
and Validation | 


In earlier chapters, we considered how to fit simple and multiple regression models and how 
to make inferences from these models. In this chapter, we first present an overview of the 
model-building and model-validation process. Then we consider in more detail some special 
issues in the selection of the predictor variables for exploratory observational studies. We 
conclude the chapter with a detailed description of methods for validating regression models. 


9.1 Overview of Model-Building Process 


At the risk of oversimplifying, we present in Figure 9.1 a strategy for the building of a 
regression model. This strategy involves three or, sometimes, four phases: 


1. Data collection and preparation 

2. Reduction of explanatory or predictor variables (for exploratory observational studies) 
3. Model refinement and selection 

4. Model validation 


We consider each of these phases in turn. 


Deta Collection 


The data collection requirements for building a regression model vary with the nature of 
the study. It is useful to distinguish four types of studies. 


Controlled Experiments. In a controlled experiment, the experimenter controls the 
levels of the explanatory variables and assigns a treatment, consisting of a combination 
of levels of the explanatory variables, to each experimental unit and observes the response. 
For example, an experimenter studied the effects of the size of a graphic presentation and 
the time allowed for analysis of the accuracy with which the analysis of the presentation is 
carried out. Here, the response variable is a measure of the accuracy of the analysis, and the 
explanatory variables are the size of the graphic presentation and the time allowed. Joi 


| 
І 
| 
i 
| 
i 

i 

It 

Н 


FIGURE 9.1 
Strategy for 
Building a 
Regression 
Model. 


344 


Collect data 


Preliminary checks | 
on data quality f 


Diagnostics for 
relationships and | 
strong interactions — 


Remedial | 
measures | 


Are 
remedial 
measures 
needed? 


Determine several 
potentially useful | 
subsets of explanatory | 
variables; include known f 
essential variables f 


Investigate curvature 
and interaction 
effects more fully 


Remedial | 
measures | 


Remedial 
measures 
needed? 


"No 


Select tentative E 
model l 


Final 
regression 
. model 


Me 


Data collection 
and preparation 


Reduction of 

number of explanatory 
variables (for 
exploratory 
observational studies) 


Model refinement 
and selection 


Model 
validation 


Chapter 9 Building the Regression Model I: Model Selection and Validation 345 


executives were used as the experimental units. A treatment consisted of a particular com- 
bination of size of presentation and length of time allowed. In controlled experiments, the 
explanatory variables are often called factors or control variables. 

The data collection requirements for controlled expefiments are straightforward, though 
not necessarily simple. Observations for each experimental unit are needed on the response 
variable and on the level of each of the control variables used for that experimental unit. 
There may be difficult measurement and scaling problems for the response variable that are 
unique to the area of application. 


Controlled Experiments with Covariates. Statistical design of experiments uses sup- 
plemental information, such as characteristics of the experimental units, in designing the 
experiment so as to reduce the variance of the experimental error terms in the regression 
model. Sometimes, however, it is not possible to incorporate this supplemental infgrmation 
into the design of the experiment. Instead, it may be possible for the experimenter to incor- 
porate this information into the regression model and thereby reduce the error variance by 
including uncontrolled variables or covariates in the model. 

In our previous example involving the accuracy of analysis of graphic presentations, 
the experimenter suspected that gender and number of years of education could affect the 
accuracy responses in important ways. Because of time constraints, the experimenter was 
able to use only a completely randomized design, which does not incorporate any supple- 
mental information into the design. The experimenter therefore also collected data on two 
uncontrolled variables (gender and number of years of education of the junior executives) 
in case that use of these covariates in the regression model would make the analysis of 
the effects of the explanatory variables (size of graphic presentation, time allowed) on the 
accuracy response more precise. 


Confirmatory ObservationalStudies. These studies, based on observational, not experi- 
mental, data, are intended to test (i.e., to confirm or not to confirm) hypotheses derived from 
previous studies or from hunches. For these studies, data are collected for explanatory vari- 
ables that previous studies have shown to affect the response variable, as well as for the new 
variable or variables involved in the hypothesis. In this context, the explanatory variable(s) 
involved in the hypothesis are sometimes called the primary variables, and the explanatory 
variables that are included to reflect existing knowledge are called the control variables 
(known risk factors in epidemiology). The control variables here are not controlled as in 
an experimental study, but they are used to account for known influences on the response 
variable. For example, in an observational study of the effect of vitamin E supplements 
on the occurrence of a certain type of cancer, known risk factors, such as age, gender, and 
race, would be included as control variables and the amount of vitamin E supplements 
taken daily would be the primary explanatory variable. The response variable would be the 
occurrence of the particular type of cancer during the period under consideration. (The use 
of qualitative response variables in a regression model will be considered in Chapter 14.) 
Data collection for confirmatory observational studies involves obtaining observations on 
the response variable, the control variables, and the primary explanatory variable(s). Here, as 
in controlled experiments, there may be important and complex problems of measurement, 
such as how to obtain reliable data on the amount of vitamin supplements taken daily. 


Exploratory Observational Studies. In the social, behavioral, and health sciences, man- 
agement, and other fields, it is often not possible to conduct controlled experiments. 


346 PartTwo Multiple Linear Regression 


Furthermore, adequate knowledge for conducting confirmatory observational studies ma 
be lacking. As a result, many studies in these fields are exploratory observationa] Studies 
where investigators search for explanatory variables that might be related to the response 
variable. To complicate matters further, any available theoretical models may involve ex- 
planatory variables that are not directly measurable, such as a family's future earnings over 
the next 10 years. Under these conditions, investigators are often forced to prospect for ex- 
planatory variables that could conceivably be related to the response variable under Study, 
Obviously, such a set of potentially useful explanatory variables can be large. For exam. 
ple, a company’s sales of portable dishwashers in a district may be affected by population 
size, per capita income, percent of population in urban areas, percent of population under 
50 years of age, percent of families with children at home, etc., etc.! 

After a lengthy list of potentially useful explanatory variables has been compiled, some 
of these variables can be quickly screened out. An explanatory variable (1) may not be 
fundamental to the problem, (2) may be subject to large measurement errors, and/or (3) may 
effectively duplicate another explanatory variable in the list. Explanatory variables that 
cannot be measured may either be deleted or replaced by proxy variables that are highly 
correlated with them. 

The number of cases to be collected for an exploratory observational regression study 
depends on the size of the pool of potentially useful explanatory variables available at this 
stage. More cases are required when the pool is large than when it is small. A general rule 
of thumb states that there should be at least 6 to 10 cases for every variable in the pool, 
The actual data collection for the pool of potentially useful explanatory variables and for 
the response variable again may involve important issues of measurement, just as for the 
other types of studies. 


Data Preparation 


Once the data have been collected, edit checks should be performed and plots prepared 
to identify gross data errors as well as extreme outliers. Difficulties with data errors are 
especially prevalent in large data sets and should be corrected or resolved before the model 
building begins. Whenever possible, the investigator should carefully monitor and control 
the data collection process to reduce the likelihood of data errors. 


Preliminary Model Investigation 


Once the data have been properly edited, the formal modeling process can begin. A va- 

` riety of diagnostics should be employed to identify (1) the functional forms in which the 
explanatory variables should enter the regression model and (2) important interactions that 
should be included in the model. Scatter plots and residual plots are useful for determining 
relationships and their strengths. Selected explanatory variables can be fitted in regression 
functions to explore relationships, possible strong interactions. and the need for transfor- 
mations. Whenever possible, of course, one should also rely on the investigator's prior 
knowledge and expertise to suggest appropriate transformations and interactions to inves- 
tigate. This is particularly important when the number of potentially useful explanatory 
variables is large. In this case, it may be very difficult to investigate all possible pair- 
wise interactions, and prior knowledge should be used to identify the important ones. The 
diagnostic procedures explained in previous chapters and in Chapter 10 should be used as 
resources in this phase of model building. 


Chapter 9 Building the Regression Model I: Model Selection and Validation 347 


Reduction of Explanatory Variables 
Controlled Experiments. The reduction of explanatory variables in the model-building 
phase is usually not an important issue for controlled experiments. The experimenter has 
chosen the explanatory variables for investigation, and a regression model is to be devel- 
oped that will enable the investigator to study the effects’of these variables on the response 
variable. After the model has been developed, including the use of appropriate functional 
forms forthe variables and the inclusion of important interaction terms, the inferential proce- 
dures considered in previous chapters will be used to determine whether the explanatory vari- 
ables have effects on the response variable and, if so, the nature and magnitude of the effects. 


Controlled Experiments with Covariates. In studies of controlled experiments with 
covariates, some reduction of the covariates may take place because investigators often 
cannot be sure in advance that the selected covariates will be helpfül i in reducing the error 
variance. For instance, the investigator in our graphic presentation example may wish to 
examine at this stage of the model-building process whether gender and number of years 
of education are related to the accuracy response, as had been anticipated. If not, the 
investigator would wish to drop them as not being helpful in reducing the model error 
variance and, therefore, in the analysis of the effects of the explanatory variables on the 
response variable. The number of covariates considered in controlled experiments is usually 
small, so no special problems are encountered in determining whether some or all of the 
covariates should be dropped from the regression model. 


Confirmatory Observational Studies. Generally, no reduction of explanatory variables 
should take place in confirmatory observational studies. The control variables were chosen 
on the basis of prior knowledge and should be retained for comparison with earlier studies 
even if some of the control variables turn out not to lead to any error variance reduction 
in the study at hand. The primary variables are the ones whose influence on the response 
variable is to be examined and therefore need to be present in the model. 


Exploratory Observational Studies. In exploratory observational studies, the number of 
explanatory variables that remain after the initial screening typically is still large. Further, 
many of these variables frequently will be highly intercorrelated. Hence, the investigator 
usually will wish to reduce the number of explanatory variables to be used in the final 
model. There are several reasons for this. A regression model with numerous explanatory 
variables may be difficult to maintain. Further, regression models with a limited number of 
explanatory variables are easier to work with and understand. Finally, the presence of many 
highly intercorrelated explanatory variables may substantially increase the sampling vari- 
ation of the regression coefficients, detract from the model’s descriptive abilities, increase 
the problem of roundoff errors (as noted in Chapter 7), and not improve, or even worsen, 
the model's predictive ability..An actual worsening of the model’s predictive ability can 
occur when explanatory variables are kept in the regression model that are not related to 
the response variable, given the other explanatory variables in the model. In that case, the 
variances of the fitted values o?($;) tend to become larger with the inclusion of the useless 
additional explanatory variables. 

Hence, once the investigator has tentatively decided upon the functional form of the 
regression relations (whether given variables are to appear in linear form, quadratic form, 
etc.) and whether any interaction terms are to be included, the next step in many exploratory 


348 Part Two Multiple Linear Regression 


observational studies is to identify a few “good” subsets of X variables for further inten. 
sive study. These subsets should include not only the potential explanatory variables i 
first-order form but also any needed quadratic and other curvature terms and any necessary 
interaction terms. 

The identification of "good" subsets of potentially useful explanatory variables to be 
included in the final regression model and the determination of appropriate functional 
and interaction relations for these variables usually constitute some of the most difficult 
problems in regression analysis. Since the uses of regression models vary, no one subset of 
explanatory variables may always be “best.” For instance, a descriptive use of a regression 
model typically will emphasize precise estimation of the regression coefficients, whereas 
a predictive use will focus on the prediction errors. Often, different subsets of the pool of 
potential explanatory variables will best serve these varying purposes? Even for a given 
purpose, it is often found that several subsets are about equally “good” according to a given 
criterion, and the choice among these “good” subsets needs to be made on the basis of 
additional considerations. 

The choice of a few appropriate subsets of explanatory variables for final consideration 
in exploratory observational studies needs to be done with great care. Elimination of key 
explanatory variables can seriously damage the explanatory power of the model and lead 
to biased estimates of regression coefficients, mean responses, and predictions of new 
observations, as well as biased estimates of the error variance. The bias in these estimates is 
related to the fact that with observational data, the error terms in an underfitted regression 
model may reflect nonrandom effects of the explanatory variables not incorporated in the 
regression model. Important omitted explanatory variables are sometimes called [шет 
explanatory variables. 

On the other hand, if too many explanatory variables are included in the subset, then this 
overfitted model will often result in variances of estimated parameters that are larger than 
those for simpler models. 

Another danger with observational data is that important explanatory variables may be 
observed only over narrow ranges. As a result, such important explanatory variables may 
be omitted just because they occur in the sample within a narrow range of values and 
therefore turn out to be statistically nonsignificant. 

Another consideration in identifying subsets of explanatory variables is that these subsets 
need to be small enough so that maintenance costs are manageable and analysis is facilitated, 
yet large enough so that adequate description, control, or prediction is possible. 

A variety of computerized approaches have been developed to assist the investigator 
in reducing the number of potential explanatory variables in an exploratory observational 
study when these variables are correlated among themselves. We present two of these 
approaches in this chapter. The first, which is practical for pools of explanatory variables 
that are small or moderate in size, considers all possible subsets of explanatory variables 
that can be developed from the pool of potential explanatory variables and identifies those 
subsets that are "good" according to a criterion specified by the investigator. The second 
approach employs automatic search procedures to arrive at a single subset of the explanatory 
variables. This approach is recommended primarily for reductions involving large pools of 
explanatory variables. 

Even though computerized approaches can be very helpful in identifying appropriate 
subsets for detailed, final consideration, the process of developing a useful regression model 
must be pragmatic and needs to utilize large doses of subjective judgment. Explanatory 


Chapter 9 Building the Regression Model I: Model Selection and Validation 349 


variables that are considered essential should be included in the regression model before 
any computerized assistance is sought. Further, computerized approaches that identify only 
asingle subset of explanatory variables as “best” need to be supplemented so that additional 
subsets are also considered before the final regression model is decided upon. 


Comments 
1. АП too often, unwary investigators will screen а set of explanatory variables by fitting the 

regression model containing the entire set of potential X variables and then simply dropping those 
for which the t* statistic (7.25): 

„_ 8k 

t = P 
© s{be) 
has a small absolute value. As we know from Chapter 7, this procedure can lead to the dropping 
of important intercorrelated explanatory variables. Clearly, а good search procedure must be able 
to handle important intercorrelated explanatory variables in such a way that not all of them will be 
dropped. 


2. Controlled experiments can usually avoid many of the problems in exploratory observational 

studies. For example, the effects of latent predictor variables are minimized by using randomization. 

y In addition, adequate ranges of the explanatory variables can be selected and correlations among the 
` explanatory variables can be eliminated by appropriate choices of their levels. а 


Model Refinement and Selection 


At this stage in the model-building process, the tentative regression model, or the several 
"good" regression models in the case of exploratory observational studies, need to be 
checked in detail for curvature and interaction effects. Residual plots are helpful in deciding 
whether one model is to be preferred over another. In addition, the diagnostic checks to 
be described in Chapter 10 are useful for identifying influential outlying observations, 
multicollinearity, etc. 

The selection of the ultimate regression model often depends greatly upon these diag- 
nostic results. For example, one fitted model may be very much influenced by a single case, 
whereas another is not. Again, one fitted model may show correlations among the error 
terms, whereas another does not. 

When repeat observations are available, formal tests for lack of fit can be made. In 
any case, a variety of residual plots and analyses can be employed to identify any lack of 
fit, outliers, and influential observations. For instance, residual plots against cross-product 
and/or power terms not included in the regression model can be useful in identifying way: 
in which the model fit can be improved further. 3 

When an automatic selection procedure is utilized for an exploratory observational study 
and only a single model is identified as “best,” other models should also be explored. One 
procedure is to use the number of explanatory variables in the model identified as "best" as 
an estimate of the number of explanatory variables-needed in the regression model. Then 
the investigator explores and identifies other candidate models with approximately the same 
number of explanatory variables identified by the automatic procedure. 

Eventually, after thorough checking and various remedial actions, such as transforma- 
tions, the investigator narrows the number of competing models to one or just a few. At this 
point, it is good statistical practice to assess the validity of the remaining candidates through 
model validation studies. These methods can be used to help decide upon a final regression 
model, and to determine how well the model will perform in practice. 


350 Part Two Multiple Linear Regression 


Model Validation 


Model validity refers to the stability and reasonableness of the regression coefficients, the 
plausibility and usability of the regression function, and the ability to generalize infe. 
ences drawn from the regression analysis. Validation is a useful and necessary part of the 
model-building process. Several methods of assessing model validity will be described ip 
Section 9.6. 


9.2 Surgical Unit Example 


Eu. 


With the completion of this overview of the model-building process for a regression study, 
we next present an example that will be used to illustrate all stages of this process as they 
are taken up in this and the following two chapters. A hospital surgical unit was interested 
in predicting survival in patients undergoing a particular type of liver operation. A random 
selection of 108 patients was available for analysis. From each patient record, the following 
information was extracted from the preoperation evaluation: 


X, blood clotting score 

X2 prognostic index 

X3 enzyme function test score 

X4 liver function test score 

Xs age, in years 

X6 indicator variable for gender (0 = male, | = female) 
X; and Xx indicator variables for history of alcohol use: 


Alcohol Use X; Хв 


Мопе 0 0 
Moderate 1 0 
Severe 0 1 


These constitute the pool of potential explanatory or predictor variables for a predictive 
regression model. The response variable is survival time, which was ascertained in a follow- 
up study. A portion of the data on the potential predictor variables and the response variable is 
presented in Table 9. 1. These data have already been screened and properly edited for errors. 


TABLE 9.1 Potential Predictor Variables and Response Variable—Surgical Unit Example. 


Case 
Number 
i 


1 
2 
3 

52 

53 

54 


Blood- Alc. АК. 
Clotting Prognostic Enzyme Liver Use: Use: Survival 

Score Index Test Test. Age Gender Mod. Heavy Time 
Xn Хә Хз Ха Xis Xie Хр Xis Y; y; =In¥; 
6.7 62 81 2.59 50 0 1 0 695 6.544 
5.1 59 66 1.70 39 0 0 0 403 5.999 
7.4 57 83 216 55 0 0 0 710 6.565 
6.4 85 40 1.21 58 0 0 1 579 6.361 
6.4 59 85 2.33 63 0 1 0 550 6.310 
8.8 78 72 3.20 56 0 0 0 651 6.478 


FIGURE 9.2 
Some 
Preliminary 
Residual 
Plots—Surgical 
Unit Example. 


Chapter 9 Building the Regression Model I: Model Selection and Validation 351 


To illustrate the model-building procedures discussed in this and the next section, we will 
use only the first four explanatory variables. By limiting the number of potential explanatory 
variables, we can explain the procedures without overwhelming the reader with masses of 
computer printouts. We will also use only the first 54 of the 108 patients. 

Since the pool of predictor variables is small, a reasonably full exploration of relation- 
ships and of possible strong interaction effects is possible at this stage of data preparation. 
Stem-and-leaf plots were prepared for each of the predictor variables (not shown). These 
highlighted severa! cases as outlying with respect to the explanatory variables. The investi- 
gator was thereby alerted to examine later the influence of these cases. A scatter plot matrix 
and the correlation matrix were also obtained (not shown). 

A first-order regression model based on all predictor variables was fitted to serve as a 
starting point. À plot of residuals against predicted values for this fitted model is shown 
in Figure 9.2a. The plot suggests that both curvature and nonconstant error variapce are 
apparent. In addition, some departure from normality is suggested by the normal HIS 
plot of residuals in Figure 9.2b. 

To make the distribution of the error terms more nearly normal and to see if the same 
transformation would also reduce the apparent curvature, the investigator examined the 


(a) Residual Plot for Y (b) Normal Plot for Y 


1000 1000 


w 500 ту 500 
3 3 
€ © 
8 © 
e e 
0 0 
—500 = 
0 500 1000 1500 3003 -2 ~1 0 1 2 3 
Predicted value Expected value 
(c) Residual Plot for In Y (d) Normal Plot for In Y 
E Е 
© ke 
E E 
5 5.5 6 6.5 7 7.5 -3 -2 -1 0 1 2 3 


Predicted value Expected value 


352 Part Two Multiple Linear Regression 


FIGURE 9.3 
JMP Scatter 
Plot Matrix 
and 
Correlation 
Matrix when 
Response 
Variable Is 
Y'—Surgical 
Unit Example. 


logarithmic transformation Y' = In Y. Data for the transformed response variable are algo 
given in Table 9.1. Figure 9.2c shows a plot of residuals against fitted values when Y’ ig 
regressed on all four predictor variables in a first-order model; also the normal probability 
plot of residuals for the transformed data shows that the distribution of the error terms jg 
more nearly normal. 

The investigator also obtained a scatter plot matrix and the correlation matrix with the 
transformed Y variable; these are presented in Figure 9.3. In addition, various scatter and 


Multivariate Correlations 


LnSurvival Bloodclot X Progindex Enzyme „aver 
LnSurvival 1.0000 0.2462 0.4699 0.6539 0.6493 
Bloodclot 0.2462 1.0000 0.0901 —0.1496 0.5024 
Progindex 0.4699 0.0901 1.0000 —0.0236 0.3690 
Enzyme 0.6539 —0.1496 —0.0236 1.0000 0.4164 
Liver 0.6493 0.5024 0.3690 0.4164 1.0000 
Scatterplot Matrix 
8 -. [| SEE Y 
7.5 А . А 
7 ° is e E > 
6.5|- LnSurvival Б 
6 


Bloodclot 


90r . mue | om Sas | euius 

LI = НЫ С "or. s 
DESEE | eee СС 
SORES ° 2° ес Progindex |. = '. : 


e 8 09, 


30 + D . E ° 
10 А . 
x [ae = 
110 LES So 
90 . taste E os fe 
70 .t D E Enzyme . ue 
. s PO 
50 07 га eek 
30 З ° et 
7 - m | 
5 S m E | 
es 8% ew ° ^ >, : ; 
Sp DEDOS | еж: 000.0739 8 LE 7 
e og oe * wee ba te p > КЫ A 


n 
6 


6.577.58 


3456789 11 


-LLLLLLTLI 


Ш] 


LLL LIE LLLI IM | TE 
1030507090 30507090110 1234567 


Chapter 9 Building the Regression Model I: Model Selection and Validation 353 


residual plots were obtained (not shown here). АП of these plots indicate that each of the 
predictor variables is linearly associated with Y', with X4 and X, showing the highest 
degrees of association and X, the lowest. The scatter plot matrix and the correlation matrix 
further show intercorrelations among the potential predictor variables. In particular, X4 has 
moderately high pairwise correlations with X;, X5, and Хз. 

On the basis of these analyses, the investigator concluded to use, at this stage of the 
model-building process, Y' = In Y as the response variable, to represent the predictor vari- 
ables in linear terms, and not to include any interaction terms. The next stage in the model- 
building process is to examine whether all of the potentia! predictor variables are needed 
or whether a subset of them is adequate. A number of useful measures have been devel- 
oped to assess the adequacy of the various subsets. We now turnsto a discussion of these 
measures. 


9.3 Criteria for Model Selection 


From any set of p — 1 predictors, 27! alternative models can be constructed. This calcu- 

id lation is based on the fact that each predictor can be either included or excluded from the 
model. For example, ће 2^ = 16 different possible subset models that can be formed from 
the pool of four X variables in the surgical unit example are listed in Table 9.2. First, there 
is the regression model with no X variables, i.e., the model Y; = Во + &. Then there аге 
the regression models with one X variable (X1, X2, Хз, X4), with two X variables (X, and 
X», X, and Хз, X, and X4, Хз and X5, Хз and X4, X3 and X4), and so on. 


TABLE 9.2 SSE,, К, R? ,, Cp, AIC,, SBC,, and PRESS, Values for All Possible Regression 


a, , 
Models—Surgical Unit Example. 

(1) (2) (3) (4) (5) (6) (7) (8) 

p SSEp R? R2 Cp AIC, SBC, PRESS , 
1 ^ 12.808 0.000 0.000 151.498 —75.703 —73.714 13.296 
2 12.031 0.061 0:043 141.164 —77.079 —73.101 13.512 
2 9.979 0.221 0.206 108.556 —87.178 83.200 10.744 
2 7.332 0.428 0.417 66.489 —103.827 —99.849 8.327 
2 7409 0.422 0.410 67.715 103.262 —99.284 8.025 
3 9.443 0.263 0.234 102.031 —88.162 —82.195 11.062 
3 5.781 0.549 0.531 43.852  —114.658 —108.691 6.988 
3 7.299 0.430 0.408 * 67.972  —102.067 —96.100 8.472 
3 4.312 0.663  -0.650 20.520 | —130.483  —124.516 5.065 
3 6.622 0.483 0.463 57.215, —107.324  —101.357 7.476 
3 5.130 0.599 0.584 33.504 121.113 115.146 6.121 
4 3.109 0.757 0.743, 3.391 —146461 138.205 3.914 
4 6.570 0.487 0.456 58.392 105.748 —97.792 7.903 
4 4.968 0.612 0.589 ' 32.932  —120.844 —112.888 6.207 
4 3.614 0.718 0.701 11.424  —138.023  —130.067 4.597 
5 3.084 0.759 0.740 5.000  —144.590 134.645 4.069 


354 PartTwo Multiple Linear Regression 


In most circumstances, it will be impossible for an analyst to make a detailed examination 
of all possible regression models. For instance, when there are 10 potential X variables in 
the pool, there would be 2!" = 1,024 possible regression models. With the availability of 
high-speed computers and efficient algorithms, running all possible regression models for 
10 potential X variables is not time consuming. Still, the sheer volume of 1,024 alternative 
models to examine carefully would be an overwhelming task for a data analyst. 

Model selection procedures, also known as subset selection or variables selection proce. 
dures, have been developed to identify a small group of regression models that are “goog” 
according to a specified criterion. A detailed examination can then be made of a limiteq 
number of the more promising or “candidate” models, leading to the selection of the final 
regression model to be employed. This limited number might consist of three to six “goog” 
subsets according to the criteria specified, so the investigator can then carefully study these 
regression models for choosing the final model. 

While many criteria for comparing the regression models have been developed, we will 
focus on six: А, R2 ,, Cp, AIC,, SBC,, and PRESS,. Before doing so, we will need to 
develop some notation. We shall denote the number of potential X variables in the pool by 
P — |. We assume throughout this chapter that all regression models contain an intercept 
term Во. Hence, the regression function containing all potential X variables contains P 
parameters, and the function with no X variables contains one parameter (Во). 

The number of X variables in a subset will be denoted by p — |, as always, so that there 
are p parameters in the regression function for this subset of X variables. Thus, we have: 


l<p<P (9.1) 


We will assume that the number of observations exceeds the maximum number of 
potential parameters: 


n>P (9.2) 


and, indeed, it is highly desirable that n be substantially larger than P, as we noted earlier, 
so that sound results can be obtained. 


R? or SSE, Criterion 


The R; criterion calls for the use of the coefficient of multiple determination R?, defined 
in (6.40), in order to identify several “good” subsets of X variables—in other words, subsets 
for which R? is high. We show the number of parameters in the regression model as a 
subscript of R? . Thus К, indicates that there are p parameters, or p — | X variables, in the 
regression function on which R is based. 

The R; criterion is equivalent to using the error sum of squares SSE, as the criterion 
(we again show the number of parameters in the regression model as a subscript). With the 
SSE, criterion, subsets for which SSE, is small are considered “good.” The equivalence of 
the К^ and SSE, criteria follows from (6.40): 

E SSE, 


E SSTO 


Since the denominator SSTO is constant for all possible regression models, Rj, varies 
inversely with SSE,. 


(9.3) 


nc 
Example 
Example — 


R2, or MSE, 


Chapter 9 Building the Regression Model I: Model Selection and Validation 355 


The RÊ criterion is not intended to identify the subsets that maximize this criterion. 
We know that R? can never decrease as additional X variables are included in the model. 
Hence, R? will be a maximum when all P — 1 potential X variables are included in the 
regression model. The intent in using the R? criterion is to find the point where adding more 
X variables is not worthwhile because it leads to a very small increase in R?, Often, this 
point is reached when only a limited number of X variables is included in the regression 
model. Clearly, the determination of where diminishing returns set in is a judgmental one. 


Table 9.2 for the surgical unit example shows in columns 1 and 2 the number of parameters 
in the regression function and the error sum of squares for each possible regression model. 
In column 3 are given the R? values. The results were obtained from a series of computer 
runs. For instance, when Х is the only X variable in the regression model, we obtain: 


SSEQG) _,_ 7409 _ 5, i 


R? = = as 
? SSTO 12.808 


Note that SSTO = SSE, = 12.808. 

Figure 9.4a contains a plot of the R? values against p, the number of parameters in the 
regression model. The maximum R? value for the possible subsets each consisting of p — 1 
predictor variables, denoted by max(R?), appears at the top of the graph for each p. These 
points are connected by solid lines to show the impact of adding additional X variables. 
Figure 9.4a makes it clear that little increase in max (R2) takes place after three X variables 
are included in the model. Hence, consideration of the subsets (X1, X5, X3) for which 
R? = .757 (as shown in column 3 of Table 9.2) and (X2, Хз, X4) for which R2 = .718 
appears to be reasonable according to the R? criterion. 

Note that variables X4 and X,, correlate most highly with the response variable, yet this 
pair does not appear together in the max(R2) model for р = 4. This suggests that X,, X», 
and Хз contain much of the information presented by X4. Note also that the coefficient 
of multiple determination associated with subset (X2, Хз, Ха), RZ = .718, is somewhat 
smaller than R2 = .757 for subset (Xi, X2, X3). 


Criterion 


Since R? does not take account of the number of parameters in the regression model 
and since max(R?) can never decrease as p increases, the adjusted coefficient of multiple 
determination R? p in (6.42) has been suggested as an alternative criterion: 


2  . (hR-AVSSE, _, MSE, 
Rap =! (; + Э SSTO —'~ SSTO Q9 
Е п = 1 


This coefficient takes ће number of parameters in the regression model into account through 
the degrees of freedom. It can be seen from (9.4) that R? p increases if and only if MSE, 
decreases since SSTO/ (n — 1) is fixed for the given Y observations. Hence, R2 p and MSE, 
provide equivalent information. We shall consider here the criterion R? р, again showing 
the number of parameters in the regression model as a subscript of the criterion. The largest 
R? " for a given number of parameters in the model, max(R2 p). can, indeed, decrease as 
p increases. This occurs when the increase in max(R?) becomes so small that it is not 


356 PartTwo Multiple Linear Regression 


FIGURE 9.4 Plot of Variables Selection Criteria —Surgical Unit Example. 


0.8 
0.6 
SOA 


0.2 


Example 


0.8 


Bast vs аст 
0.6 
©. 
as 0.4 
0.2 
3 4 5 oF 2 3 4 5 
(a) (b) 
—50 ud 
SF 400 
3 4 5 21807 2 3 4 5 
(9 (d) 
15 
a 10 
ч 
& 5 
ped [0] 
3 4 5 1 2 3 4 5 
p p 
(e) (f) 


sufficient to offset the loss of an additional degree of freedom. Users of the RÅ , criterion 
seek to find a few subsets for which R? „ is at the maximum or so close to the maximum 
that adding more variables is not worthwhile. 


The RA p Values for all possible regression models for the surgical unit example are shown 
in Table 9.2, column 4. For instance, we have for the regression model containing only X: 


d n — 1N SSE(X,) 53N 7.409 
Sipe 2 = 410 
Raa = 1 =) SSTO ^ 52) 12808 = ^ 


Figure 9.4b contains the R2 p Plot for the surgical unit example. We have again connected 
the max(R2 p) Values by solid lines. The story told by the R2 p Plot in Figure 9.4b is very 
similar to that told by the R? plot in Figure 9.4a. Consideration of the subsets (Xi, X». 
Хз) and (X5, X3, X4) appears to be reasonable according to the R? Ё criterion. Notice that 
К? = .743 is maximized for subset (X1, X2, Хз), and that adding Х to this subset—thus 
using all four predictors— decreases the criterion slightly: К? „ = .740. 


Chapter 9 Building the Regression Model I: Model Selection and Validation 357 


Mallows' Cp Criterion 
This criterion is concerned with the total mean squared error of the n fitted values for each 
subset regression model. The mean squared error concept involves the total error in each 
fitted value: 


Y; — ш (9.5) 


where 4; is the true mean response when the levels of the predictor variables X, are those 
for the ith case. This total error is made up of a bias component and a random error 
component: 
1. The bias component for the ith fitted value M , also called the model error component, 
is: 
i 
E(1;) — ui (9.5a) 


here Е{Ў;} is the expectation of the ith fitted value for the given regression model. If 
` the fitted model is not correct, E (f; will differ from the true mean response u; and the 
difference represents the bias of the fitted model. 
2. The random error component for Ў; is: 


Ў, — E(f;) (9.5b) 


This component represents the deviation of the fitted value f; for the given sample from the 
expected value when the ith fitted value is obtained by fitting the same regression model to 
all possible samples. 


The mean squared error for Ӯ; is defined as the expected value of the square of the total 
error in (9.5)—in other words, the expected value of: 


Ê; — ш)? = EY} — ш) + Ê: — Е{Ў;})]? 
It can be shown that this expected value is: 
E($; — ш} = (Е) ш) + o?(;) (9.6) 


where c?(f; is the variance of the fitted value ?;. We see from (9.6) that the mean squared 
error for the fitted value Ӯ, is the sum of ће squared bias and the variance of Ў,. 

The total mean squared error for all п fitted values Ӯ, is the sum of ће n individual mean 
squared errors in (9.6): 


i > 


SER ш) +07 PH = Y ER- uy + otf) (9.7) 
i=l i= 


i=] i=} 


58 PartTwo Multiple Linear Regression 


Example 


The criterion measure, denoted by Г„, is simply the total mean squared error in (9.7) ауу, ded 
by c7. the true error variance: 


Г, = E э – ш) DX (98) 
i=l i 


The model which includes all P — 1 potential X variables is assumed to һауе been 
carefully chosen so that МӘЕ(Х |, .... Хрл) is an unbiased estimator of o7. It can then be 
shown that an estimator of Г„ is С: 


SSE, 
C= 
MSE(Xı,..., Xp) 
where SSE, is the error sum of squares for the fitted subset regression model with p 
parameters (i.e., with p — | X variables). 


When there is no bias in the regression model with p — 1 X variables so that E(f;) = ц 
the expected value of C, is approximately p: 


(n — 2p) (9.9) 


E{C,)}* p when E{Î;} = p; (9.10) 


Thus, when the C, values for all possible regression models are plotted against p, those 
models with little bias will tend to fall near the line C, = p. Models with substantial bias will 
tend to fall considerably above this line. C, values below the line C, = p are interpreted as 
showing no bias, being below the line due to sampling error. The C, value for the regression 
model containing all P — | X variables is, by definition, P. The С, measure assumes that 
MSE(X,..... X p. 4) is an unbiased estimator of c^, which is equivalent to assuming that 
this model contains no bias. 

In using the C, criterion, we seek to identify subsets of X variables for which (1) the 
C, value is small and (2) the C, value is near p. Subsets with small C, values have a small 
total mean squared error, and when the C, value is also near p, the bias of the regression 
model is small. It may sometimes occur that the regression model based on a subset of X 
variables with a small C, value involves substantial bias. In that case, one may at times 
prefer a regression model based on a somewhat larger subset of X variables for which the 
C, value is only slightly larger but which does not involve a substantial bias component. 
Reference 9.1 contains extended discussions of applications of the C, criterion. 


Table 9.2, column 5, contains the C, values for all possible regression models for the surgical 
unit example. For instance, when X, is the only X variable in the regression model, the C, 
value 15: 


SSE(X 4) 
fO SEX Keke a 
п — 5 
7.409 
= 3084 ^ [54 — 2(2)] = 67.715 
49 


The C, values for all possible regression models are plotted in Figure 9.4c. We find that 
C, is minimized for subset (X1, X2, Хз). Notice that C, = 3.391 < p = 4 for this model, 
indicating little or no bias in the regression model. 


» c 


Chapter 9 Building the Regression Model I: Model Selection and Validation 359 


Note that use of all potential X variables (Х|, X2, Хз, Ха) results in a C, value of exactly 
P, as expected; here, C5 = 5.00. Also note that use of subset (X2, Хз, X4) with C, value 
Са = 11.424 would be poor because of the substantial bias with this model. Thus, the Cp 
criterion suggests only one subset (Ху, Хә, Хз) for the surgical unit example. 


Comments 


1. Effective use of the C, criterion requires careful development of the pool of P — 1 potential X vari- 
ables, with the predictor variables expressed in appropriate form (linear, quadratic, transformed), 
and important interactions included, so that MSE(X,, ..., Xp.) provides an unbiased estimate 
of the error variance с2. 

2. The C, criterion places major emphasis on the fit of the subset model for the n sample observations. 
At times, a modification of the C, criterion that emphasizes new observations to be predicted may 


be preferable. е 
3. To see why C, as defined in (9.9) is an estimator of Гь, we need to utilize two results that we shall 
simply state. First, it can be shown that: i 


Уо?) = ро? (9.11) 


* i=l 
Thus, the total random error of ће n fitted values Ӯ, increases as the number of variables in the 


regression model increases. 
Further, it can be shown that: 


E{SSEp} = V XE(f;) — ш) + — p)o? (9.12) 


Hence, Г, in (9.8) can be expressed as follows: 


1 
Г, = 218(55Е,) — (n — р)о? + po’) 


E(SSE,) 
Replacing E(SSE,) by the estimator SSE, and using MSE(X,,..., Хр) as an estimator of o? 


yields C, in (9.9). 
4. To show that the C, value for the regression model containing all P — 1 X variables is P, we 
substitute in (9.9), as follows: 


_ SSE(X,, ..., Xp-1) 


C = EROR Хы) 4) 
n—P 
= (n— Р) + (n—2P) 
=P 
; п 


ý > 


AIC, and SBC, Criteria 
We have seen that both RÈ, and C, are model selection criteria that penalize models 
having large numbers of predictors. Two popular alternatives that also provide penalties 
for adding predictors are Akaike's information criterion (AIC,) and Schwarz’ Bayesian 


360 Part Two 


Example 


Multiple Linear Regression 


criterion (SBC,,). We search for models that have small values of AIC, or SBC,, Where 
these criteria are given by: 

AIC, = піп SSE, — піпп + 2p (9.14) 

SBC, = nInSSE, — піпп + [Inn]p (9.15) 

Notice that for both of these measures, the first term is п In SSE,, which decreases ag p 

increases. The second term is fixed (for a given sample size п), and the third term increase, 

with the number of parameters, p. Models with small SSE, will do well by these criteria 

as long as the penalties—2p for AIC, and [Inn]p for SBC,,—are not too large. If n > g 


the penalty for SBC, is larger than that for AJC,,; hence the SBC, criterion tends to favor 
more parsimonious models. 


Table 9.2, columns 6 and 7, contains the AIC, and SBC, values for all possible regression 
models for the surgical unit example. When X4 is the only X variable in the regression 
model, the AIC, value is: 


AIC; = nin SSE; — ninn + 2р 
= 541n 7.409 — 541n 54 + 2(2) = —103.262 
Similarly, the SBC, value is: 
$ВС» = n In SSE: — n lnn + [In n]p 
= 541n 7.409 — 541n 54 + [In 54](2) = —99.284 


The AIC, and SBC, values for all possible regression models are plotted in Figures 9.44 
and e. We find that both of these criteria are minimized for subset (X4, X», X3). 


PRESS, Criterion 


The PRESS, (prediction sum of squares) criterion is a measure of how well the use of the 
fitted values for a subset model can predict the observed responses Y;. The error sum of 
squares, SSE = 5 (Y; — Y, is also such a measure. The PRESS measure differs from SSE 
in that each fitted value Ў, for the PRESS criterion is obtained by deleting the ith case from 
the data set, estimating the regression function for the subset model from the remaining 
n — 1 cases, and then using the fitted regression function to obtain the predicted value Am 
for the ith case. We use the notation Ӯ; гу now for the fitted value to indicate, by the first 
subscript i, that it is a predicted value for the ith case and, by the second subscript (i), that 
the ith case was omitted when the regression function was fitted. 
The PRESS prediction error for the ith case then is: 


^ 


Y; — Yi (9.16) 


and the PRESS, criterion is the sum of the squared prediction errors over all п cases: 


п 
PRESS, = SQ — Yu) (9.17) 
i=l 
Models with small PRESS, values are considered good candidate models. The reason is 
that when the prediction errors Y; — fiy are small, so are the squared prediction errors and 
the sum of the squared prediction errors. Thus, models with small PRESS, values fit well 
in the sense of having small prediction errors. 


ТЕБ 
Ехатріе 
Examp'e —. 


Chapter 9 Building the Regression Model I: Model Selection and Validation 361 


PRESS, values can be calculated without requiring п separate regression runs, each time 
deleting one of the n cases. The relationship in (10.21) and (10.212), to be explained in the 
next chapter, enables опе to calculate all У; гу values from a single regression run. 


Table 9.2, column 8, contains the PRESS, values for all possible regression models for the 
surgical unit example. The PRESS, values are plotted in Figure 9.4f. The message given 
by the PRESS, values in Table 9.2 and plot in Figure 9.4f is very similar to that told by 
the other criteria. We find that subsets (X1, X2, Хз) and (X5, X3, X4) have small PRESS 
values; in fact, the set of all X variables (Xj, Хә, X3, X4) involves a slightly larger PRESS 
value than subset (X1, X2, Хз). The subset (X2, Хз, Хд) involves a PRESS value of 4.597, 
which is moderately larger than the PRESS value of 3.914 for subset#(X,, X5, X3). 


Comment 
PRESS values can also be useful for model validation, as will be explained in Section 9.6. і а 


94 Automatic Search Procedures for Model Selection 


I 


As noted in the previous section, the number of possible models, 2^^!, grows rapidly with 
the number of predictors. Evaluating all of the possible alternatives can be a daunting 
endeavor. To simplify the task, a variety of automatic computer-search procedures have 
been developed. In this section, we will review the two most common approaches, namely 
"best" subsets regression and stepwise regression. 

For the remainder of this chapter, we will employ the full set of eight predictors from 
the surgical unit data. Recall that these predictors are displayed in Table 9.1 on page 350 
and described there as well. 


"Best" Subsets Algorithms 


Example _ 
————— 


Time-saving algorithms have been developed in which the best subsets according to a 
specified criterion are identified without requiring the fitting of all of the possible subset 
regression models. In fact, these algorithms require the calculation of only a small fraction 
of all possible regression models. For instance, if the C p criterion is to be employed and the 
five best subsets according to this criterion are to be identified, these algorithms search for 
the five subsets of X variables with the smallest C, values using much less computational 
effort than when all possible subsets are evaluated. These algorithms are called "best" 
subsets algorithms. Not only do these algorithms provide the best subsets according to the 
specified criterion, but they often also identify several “good” subsets for each possible 
number of X variables in the model to give the investigator additional helpful information 
in making the final selection of the subset of X variables to be employed in the regression 
model. А 

When the pool of potential X variables is very large, say greater than 30 ог 40, even 
the “best” subset algorithms may require excessive computer time. Under these conditions, 
one of the stepwise regression procedures, described later in this section, may need to be 
employed to assist in the selection of X variables. 


For the eight predictors in the surgical unit example, we know there are 28 = 256 possible 
models. Plots of the six model selection criteria discussed in this chapter are displayed in 


362 PartTwo Multiple Linear Regression 


FIGURE 9.5 
Plot of Variable 
Selection 
Criteria with 
All Eight 
Predictors— 
Surgical Unit 
Example. 


0.5 
Q 
as 
0 
—0.5 
Зот 50 
P3 
200 AE G3 —100 
. g g a 
“Р a a Y 
3 8 o x 
100 i | | { 8 —150 
t Н g * 
200 
—50 
2 100 
x 
СА 


—150 


—200 


Figure 9.5. The best values of each criterion for each p have been connected with solid 
lines. These best values are also displayed in Table 9.3. The overall optimum criterion values 
have been underlined in each column of the table. Notice that the choice of a "best" model 
depends on the criterion. For example, a seven- or eight-parameter model is identified as 
best by the RŽ , criterion (both have max (R2 „) = -823), asix-parameter model is identified 
by the C, criterion (min(C7) = 5.541), and a seven-parameter model is identified by the 
AIC, criterion (min(AIC;) — —163.834). As is frequently the case, the SBC, criterion 
identifies a more parsimonious model as best. In this case both the SBC, and PRESS, criteria 
point to five-parameter models (min(SBCs) = —153.406 and min(PRESSs) = 2.738). As 
previously emphasized, our objective at this stage is not to identify a single best model; we 
hope to identify a small set of promising models for further study. 

Figure 9.6 contains, for the surgical unit example, MINITAB output for the “best” subsets 
algorithm. Here, we specified that the best two subsets be identified for each number of 
variables in the regression model. The MINITAB arn uses the К? criterion, but also 
shows for each of the "best" subsets the R? a.p ‚апа Ү/М5Е, (labeled S) values. The 
right-most columns of the tabulation show the s gem in the subset. From the figure 
it is seen that the best subset, according to the RD Р criterion, is either the seven-parameter 


TABLE 9.3 
Selection 
Criterion 
Values— 
Surgical Unit 
Example. 


FIGURE 9.6 
MINITAB 
Output for 


“Best” Two d 


Subsets for 
Each Subset 
Size—Surgical 
Unit Example. 


Chapter 9 Building the Regression Model I: Model Selection and Validation 363 


(1) (2) (3 (4) (5) (6) (7) 
p SSE, R2 R2 " С р AIC, SBC, PRESS, 
1 12.808 0.000 0.000 240.452 —75.703 —73.714 13.296 
2 7.332 0.428 0.417 117.409 —103.827 —99.849 8.025 
3 4.312 0.663 0.650 50.472 —130.483 —124.516 5.065 
4 2.843 0.778 0.765 18.914 —150.985 —143.029 3.469 
5 2.179 0.830 0.816 5.751 —163:351 —153:406 2.738 
6 2.082 0.837 0.821 5.541 —163.805 · —151.871 2.739 
7 2.005 0.843 0.823 5.787 —163.834 —149.911 2.772 
8 1.972 0.846 0.823 7.029 —1 62. 736 E 46.824 2.809 
9 1.971 0.846 0.819 9.000 —160.771 —142.870 2.931 
i 
Response is InSurviv 5 
ВР H 
lr H i 
oo0E Gis 
ognL est 
dizi nth 
enyvAdme 
ldmegeoa 
Vars  R-Sq R-Sq(adj) C-p S oeererdv 
1 42.8 41.7 117.4 0.37549 X 
1 42.2 41.0 119.2 0.37746 X 
2 66.3 65.0 50.5 0.29079 XX 
2 59.9 58.4 69.1 0.31715 XX 
3 77.8 76.5 18.9 0.23845 XX X 
3 75.7 74.3 25.0 0.24934 XX 
4 83.0 81.6 5.8 0.21087 XXX X 
4 81.4 79.9 10.3 0.22023 XXX X 
5 83.7 82.1 5.5 0.20827 XXX X X 
5 83.6 81.9 6.0 0.20931 XXX X X 
6 84.3 82.3 5.8 0.20655 XXX XX X 
6 83.9 81.9 7.0 0.20934 ХХХ ХХХ 
7 84.6 82.3 7.0 0.20705 ХХХ XXXX 
7 84.4 82.0 7.7 0.20867 ХХХХХХ X 
8 84.6 81.9 9.0 0.20927 XXXXXXXX 


~ 


model based on all predictors except Liver (X4) and Histmod (history of moderate alcohol 
use—X;), or the eight-parameter model based on all predictors except Liver (X4). The R? » 
criterion value for both of these models is .823. К 

The all-possible-regressions procedute leads to the identification of a small number of 
subsets that are “good” according to a specified criterion. In the surgical unit example, two 
of the four criteria —SBC, and PRESS, —pointed to models with 4 predictors, while the 
other criteria favored larger models. Consequently, one may wish at times to consider more 
than one criterion in evaluating possible subsets of X variables. 


364 PartTwo Multiple Linear Regression 


Once the investigator has identified a few "good" subsets for intensive examination, а 
final choice of the model variables must be made. This choice, as indicated by our model. 
building strategy in Figure 9.1, is aided by residual analyses (and other diagnostics to ђе 
covered in Chapter 10) and by the investigator's knowledge of the subject under study, anq 
is finally confirmed through model validation studies. 


Stepwise Regression Methods 


In those occasional cases when the pool of potential X variables contains 30 to 40 or even 
more variables, use of a "best" subsets algorithm may not be feasible. An automatic search 
procedure that develops the “best” subset of X variables sequentially may then be helpful, 
The forward stepwise regression procedure is probably the most widely used of the automatic 
search methods. It was developed to economize on computational efforts, as compared with 
the various all-possible-regressions procedures. Essentially, this search method develops а 
sequence of regression models, at each step adding or deleting an X variable. The criterion 
for adding or deleting an X variable can be stated equivalently in terms of error sum of 
squares reduction, coefficient of partial correlation, f* statistic, or F* statistic. 

Anessential difference between stepwise procedures and the "best" subsets algorithm is 
that stepwise search procedures end with the identification of a single regression model as 
“best.” With the “best” subsets algorithm, on the other hand, several regression models can 
be identified as “good” for final consideration. The identification of asingle regression model 
as "best" by the stepwise procedures is a major weakness of these procedures. Experience 
has shown that each of the stepwise search procedures can sometimes err by identifying a 
suboptimal regression model as “best.” In addition, the identification of a single regression 
model may hide the fact that several other regression models may also be “good.” Finally, 
the "goodness" of a regression model can only be established by a thorough examination 
using a variety of diagnostics. 

What then can we do on those occasions when the pool of potential X variables is very 
large and an automatic search procedure must be utilized? Basically, we should use the 
subset identified by the automatic search procedure as a starting point for searching for 
other “good” subsets. One possibility is to treat the number of X variables in the regression 
model identified by the automatic search procedure as being about the right subset size and 
then use the "best" subsets procedure for subsets of this and nearby sizes. 


Forward Stepwise Regression 


We shall describe the forward stepwise regression search algorithm in terms of the г 
statistics (2.17) and their associated P-values for the usual tests of regression parameters. 


1. The stepwise regression routine first fits a simple linear regression model for each of 
the P—1 potential X variables. For each simple linear regression model, the г* statistic (2.17) 
for testing whether or not the slope is zero is obtained: 


„_ cB 
t7 siba} 
The X variable with the largest t* value is the candidate for first addition. If this r* value 


exceeds a predetermined level, or if the corresponding P-value is less than a predeter 
mined a, the X variable is added. Otherwise, the program terminates with no X variable 


(9.18) 


Example 


Chapter 9 Building the Regression Model I: Model Selection and Validation 365 


considered sufficiently helpful to enter the regression model. Since the degrees of freedom 
associated with MSE vary depending on the number of X variables in the model, and since 
repeated tests on the same data are undertaken, fixed t* limits for adding or deleting a 
variable have no precise probabilistic meaning. For this reason, software programs often 
favor the use of predetermined o-limits. 

2. Assume X; is the variable entered at step 1. The stepwise regression routine now 
fits all regression models with two X variables, where X; is one of the pair. For each 
such regression model, the 7* test statistic corresponding to the newly added predictor Хк 
is obtained. This is the statistic for testing whether or not В; = 0 when X; and X, are 
the variables in the model. The X variable with the largest 7^ valüe—or equivalently, the 
smallest P-value—is the candidate for addition at the second stage: If this 7* value exceeds 
a predetermined level (i.e., the P-value falls below a predetermined level), the second X 
variable is added. Otherwise, the program terminates. і 

3. Suppose X; is added at the second stage. Now the stepwise regression routine examines 
whether any of the other X variables already in the model should be dropped. For our 
illustration, there is at this stage only one other X variable in the model, X;, so that only 
one 7* test statistic is obtained: 


„_ b 


t = {Б} (9.1 9) 


At later stages, there would be a number of these 7* statistics, one for each of the variables 
in the model besides the one last added. The variable for which this 7* value is smallest (or 
equivalently the variable for which the P-value is largest) is the candidate for deletion. If 
this 7* value falls below—or the P-value exceeds—a predetermined limit, the variable is 
dropped from the model; otherwise, it is retained. 

4. Suppose X; is retained so that both X4 and X; are now in the model. The stepwise 
regression routine now examines which X variable is the next candidate for addition, then 
examines whether any of the variables already in the model should now be dropped, and 
so on until no further X variables can either be added or deleted, at which point the search 
terminates. 


Note that the stepwise regression algorithm allows an X variable, brought into the model 
at an earlier stage, to be dropped subsequently if it is no longer helpful in conjunction with 
variables added at later stages. 


Figure 9.7 shows MINITAB computer printout for the forward stepwise regression proce- 
dure for the surgical unit example. The maximum acceptable œ limit for adding a variable 
is 0.10 and the minimum acceptable о limit for removing a variable is 0.15, as shown at the 
top of Figure 9.7. 

We now follow through the steps.” d 


> 


1. At the start of the stepwise search, no X variable is in the model so that the model 
to be fitted is Y; = Во + £j. In step 1, the ¢* statistics (9.18) and corresponding P-values 
are calculated for each potential X variable, and the predictor having the smallest P-value 
(largest t* value) is chosen to enter the equation. We see that Enzyme (Хз) had the largest 


366 PartTwo Multiple Linear Regression 


FIGURE 9.7 Alpha-to-Enter: 0.1  Alpha-to-Remove: 0.15 

MINITAB 

Forward Response is 1nSurviv on 8 predictors, with N = 54 

Stepwise 

Regression Step 1 2 3 4 

Output— Constant 5.264 4.351 4.291 3.852 

Surgical Unit 

Example. Enzyme 0.0151 0.0154 0.0145 0.0155 
T-Value 6.23 8.19 9.33 11.07 
P-Value 0.000 0.000 0.000 0.000 
ProgInde 0.0141 0.0149 0.0142 
T-Value 5.98 7.68 8.20 NM 
P-Value 0.000 0.000 0.000 
Histheav 0.429 0.353 
T-Value 5.08 4.57 
P-Value 0.000 0.006 
Bloodclo "0.073 
T-Value 3.86 
P-Value 0.000 
S 0.375 0.291 0.238 0.211 
R-Sq 42.76 66.33 77.80 82.99 
R-Sq(adj) 41.66 65.01 76.47 81.60 
C-p 117.4 50.5 18.9 5.8 


test statistic: 


x bs .015124 


tt = —~ = —_—___ = 6,23 
3 s{b3} .002427 


The P-value for this test statistic is 0.000, which falls below the maximum acceptable 
a-to-enter value of 0.10; hence Enzyme (Хз) is added to the model. 

2. At this stage, step 1 has been completed. The current regression model contains 
Enzyme (Хз), and the printout displays, near the top of the column labeled “Step 1,” the 
regression coefficient for Enzyme (0.0151), the z* value for this coefficient (6.23), and the 
corresponding P-value (0.000). At the bottom of column 1, a number of variables-selection 
criteria, including R? (42.76), RA (41.66), and C, (117.4) are also provided. 

Next, all regression models containing Хз and another X variable are fitted, and the г* 
statistics calculated. They are now: 


p — ‚| MSROGIXS) 
к V MSEQG, Хе) 


Progindex (X2) has the highest 7* value, and its P-value (0.000) falls below 0.10, so that 
Хә now enters the model. 


3 Chapter 9 Building the Regression Model I: Model Selection and Validation 367 


3. The column labeled Step 2 in Figure 9.7 summarizes the situation at this point. Enzyme 
and Progindex (Хз and X2) are now in the model, and information about this model is 
provided. At this point, a test whether Enzyme (X3) should be dropped is undertaken, but 
because the P-value (0.000) corresponding to Хз is not above 0.15, this variable is retained. 

4. Next, all regression models containing X2, X5, and one of the remaining potential X 
variables are fitted. The appropriate ¢* statistics now are: 


„_ | MSR(Xi 1X2, Хз) 
MSE(X», Хз, Ху) 


k= 


The predictor labeled Histheavy (Xs) had the largest 7? value (P-value = 0.000) and 
was next added to the model. 


5. The column labeled Step 3 in Figure 9.7 summarizes the situation at this point. X», 
Хз, and Xg are now in the model. Next, a test is undertaken to determine whether Хә or 
Хз should be dropped. Since both of the corresponding P-values are less than 0.15, neither 
predictor is dropped from the model. 


6. At step 4 Bloodclot (X,) is added, and no terms previously included were dropped. 
The right-most column of Figure 9.7 summarizes the addition of variable X, into the model 
containing variables X2, Хз, апа Xg. Next, a test is undertaken to determine whether either 
X2, Хз, or Xg should be dropped. Since all P-values are less than 0.15 (all are 0.000), all 
variables are retained. i 


7. Finally, the stepwise regression routine considers adding one of X4, X5, Xe, or X; to 
the model containing X;, X2, Хз, and Хв. In each case, the P-values are greater than 0.10 
(not shown); therefore, no additional variables can be added to the model and the search 
process is terminated. 


Thus, the stepwise search algorithm identifies (X4, X2, Хз, Xg) as the “best” subset of 
X variables. This model also happens to be the model identified by both the SBC, and 
PRESS, criteria in our previous analyses based on an assessment of "best" subset selection. 


Comments 


1. The choice of a-to-enter and w-to-remove values essentially represents a balancing of opposing 
tendencies. Simulation studies have shown that for large pools of uncorrelated predictor variables that 
have been generated to be uncorrelated with the response variable, use of large or moderately large 
a-to-enter values as the entry criterion results in a procedure that is too liberal; that is, it allows 
too many predictor variables into the model. On the other hand, models produced by an automatic 
selection procedure with small o-to-enter values are often underspecified, resulting in o? being badly 
overestimated and the procedure being too conservative (see, for example, References 9.2 and 9.3). 

2. The maximum acceptable a-to-enter value should never be larger than the minimum acceptable 
a-to-remove value; otherwise, cyclifig is possible where a variable is continually entered and removed. 

3. The order in which variables enter the regression’model does not reflect their importance. At 
times, a variable may enter the model, only to be dropped at a later stage because it can be predicted 
well from the other predictors that have been subsequently added. ~ | 


А 


Other Stepwise Procedures 


Other stepwise procedures are available to find a "best" subset of predictor variables. We 
mention two of these. 


368 PartTwo Multiple Linear Regression 


Forward Selection. The forward selection search procedure is a simplified version of 
forward stepwise regression, omitting the test whether a variable once entered into the 
model should be dropped. 


Backward Elimination. The backward elimination search procedure is the opposite of 
forward selection. It begins with the model containing all potential X variables and identifies 
the one with the largest P-value. If the maximum P-value is greater than a predetermined 
limit, that X variable is dropped. The model with the remaining P — 2 X variables is 
then fitted, and the next candidate for dropping is identified. This process continues until 
no further X variables can be dropped. A stepwise modification can also be adapted that 
allows variables eliminated earlier to be added later: this modification is called the backward 
stepwise regression procedure. 


" 


Comment 


For small and moderate numbers of variables in the pool of potential X variables, some statisticiang 
argue for backward stepwise search over forward stepwise search (see Reference 9.4). A potential 
disadvantage of the forward stepwise approach is that the MSE—and hence s (b,]— will tend to be 
inflated during the initial steps, because important predictors have been omitted. This in turn leads 
to f? test statistics (9.18) that are too small. For the backward stepwise procedure, MSE values tend 
to be more nearly unbiased because important predictors are retained at each step. An argument in 
favor of the backward stepwise procedure can also be made in situations where it is useful as a firg 
step to look at each X variable in the regression function adjusted for ай the other X variables in 
the pool. и 


95 Some Final Comments on Automatic 
Model Selection Procedures 


Our discussion of the major automatic selection procedures for identifying the "best" subset 
of X variables has focused on the main conceptual issues and not on options, variations, 
and refinements available with particular computer packages. It is essential that the specific 
features of the package employed be fully understood so that intelligent use of the package 
can be made. In some packages, there is an option for regression models through the origin. 
Some packages permit variables to be brought into the model and tested in pairs or other 
groupings instead of singly, to save computing time or for other reasons, Some packages, 
once a “best” regression model is identified, will fit all the possible regression models with 
the same number of variables and will develop information for each model so that a final 
choice can be made by the user. Some stepwise programs have options for forcing variables 
into the regression model; such varíables are not removed even if their P-values become 
too large. 

The diversity of these options and special features serves to emphasize a point made 
earlier: there is no unique way of searching for “good” subsets of X variables, and subjective 
elements must play an important role in the search process. 

We have considered a number of important issues related to exploratory model building, 
but there are many others. (A good discussion of many of these issues may be found in Refer- 
ence 9.5.) Most important for good model building is the recognition that no automatic search 
procedure will always find the “best” model, and that, indeed, there may exist several “good” 
regression models whose appropríateness for the purpose at hand needs to be investigated. 


Chapter 9 Building the Regression Model I: Model Selection and Validation 369 


Judgment needs to play an important role in model building for exploratory studies. 
Some explanatory variables may be known to be more fundamental than others and there- 
fore should be retained in the regression model if the primary purpose is to develop a good 
explanatory model. When a qualitative predictor variable is represented in the pool of poten- 
tial X variables by a number of indicator variables (e.g., geographic region is represented by 
several indicator variables), it is often appropriate to keep these indicator variables together 
as a group to represent the qualitative variable, even if a subset containing only some of 
the indicator variables is "better" according to the criteríon employed. Similarly, if second- 
order terms x? or interaction terms X, Xy need to be present in a regression model, one 
would ordinarily wish to have the first-order terms in the model as representing the main 
effects. " 

The selection of a subset regression model for exploratory observational studies has 
been the subject of much recent research. Reference 9.5 provides information about many 
of these studies. New methods of identifying the “best” subset have been proposed, including 
methods based on deleting one case at a time and on bootstrapping. With the first method, the 
criterion is evaluated for identified subsets n times, each time with one case omitted, in order 
to select the “best” subset. With bootstrapping, repeated samples of cases are selected with 
replacement from the data set (alternatively, repeated samples of residuals from the model 
fitted to all X variables are selected with replacement to obtain observed Y values), and the 
criterion is evaluated for identified subsets in order to select the “best” subset. Research 
by Breiman and Spector (Ref. 9.7) has evaluated these methods from the standpoint of the 
closeness of the selected model to the true model and has found the two methods promising, 
the bootstrap method requiring larger data sets. 

An important issue in exploratory model building that we have not yet considered is 
the bias in estimated regression coefficients and in estimated mean responses, as well as in 
their estimated standard deviations, that may result when the coefficients and error mean 
square for the finally selected regression model are estimated from the same data that were 
used for selecting the model, Sometimes, these biases may be substantial (see, for example, 
References 9.5 and 9.6). In the next section, we will show how one can examine whether the 
estimated regression coefficients and error mean square are biased to a substantial extent. 


96 Model Validation 


The final step in the model-building process is the validation of the selected regression 
models. Model validation usually involves checking a candidate model against independent 
data. Three basic ways of validating a regression model are: 


1. Collection of new data to check the model and its predictive ability. 

2. Comparison of results with theoretical expectations, earlier empirical results, and 
simulation results, : 

3, Use of a holdout sample to check the model and its predictive ability. 


When a regression model is used in a controlled experiment, a repetition of the experiment 
and its analysis serves to validate the findings in the initial study if similar results for the 
regression coefficients, predictive ability, and the like are obtained. Similarly, findings in 
confirmatory observational studies are validated by a repetition of the study with other data. 


370 PartTwo Multiple Linear Regression 


As we noted in Section 9,1, there are generally no extensive problems in the selection of 
predictor variables in controlled experiments and confirmatory observational studies, [n 
contrast, explanatory observational studies frequently involve large pools of explanato 
variables and the selection of a subset of these for the final regression model. For these 
studies, validation of the regression model involves also the appropriateness of the variables 
selected, as well as the magnitudes of the regression coefficients, the predictive ability of 
the model, and the like. Our discussion of validation will focus primarily on issues that arise 
in validating regression models for exploratory observational studies. A good discussion 
of the need for replicating any study to establish the generalizability of the findings may 
be found in Reference 9.8. References 9.9 and 9.10 provide helpful presentations of issues 
arising in the validation of regression models. 


Collection of New Data to Check Model 


The best means of model validation is through the collection of new data. The purpose 
of collecting new data is to be able to examine whether the regression model developed 
from the earlier data is still applicable for the new data. If so, one has assurance about the 
applicability of the model to data beyond those on which the model is based. 


Methods of Checking Validity. "There are a variety of methods of examining the validity 
of the regression model against the new data. One validation method is to reestimate the 
model form chosen earlier using the new data. The estimated regression coefficients and 
various characteristics of the fitted model are then compared for consistency to those of the 
regression model based on the earlier data. If the results are consistent, they provide strong 
support that the chosen regression model is applicable under broader circumstances than 
those related to the original data. 

A second validation method is designed to calibrate the predictive capability of the 
selected regression model. When a regression model is developed from given data, it is 
inevitable that the selected model is chosen, at least in large part, because it fits well the 
data at hand. For a different set of random outcomes, one may likely have arrived at a 
different model in terms of the predictor variables selected and/or their functional forms 
and interaction terms present in the model. A result of this model development process is 
that the error mean square MSE will tend to understate the inherent variability in making 
future predictions from the selected model. 

A means of measuring the actual predictive capability of the selected regression model 
is to use this model to predict each case in the new data set and then to calculate the mean 
of the squared prediction errors, to be denoted by MSPR, which stands for mean squared 
prediction error: 


ОИ 
MSPR = 2-24 - 2 


* 


(9.20) 


n 


where: 


Y; is the value of the response variable in the ith validation case 
Ў, is the predicted value for the ith validation case based on the model-building data se 
n* is the number of cases in the validation data set 


Comparison 


Chapter 9 Building the Regression Model I: Model Selection and Validation 371 


If the mean squared prediction error MSPR is fairly close to MSE based on the regression 
fitto the model-building data set, then the error mean square MSE for the selected regression 
model is not seriously biased and gives an appropriate indication of the predictive ability of 
the model. If the mean squared prediction error 1s much larger than MSE, one should rely 
on the mean squared prediction error as an indicator of how well the selected regression 
model will predict in the future. 


Difficulties in Replicating a Study. Difficulties often arise when new data are collected 
to validate a regression model, especially with observational studies. Even with controlled 
experiments, however, there may be difficulties in replicating an earlier study in identical 
fashion. For instance, the laboratory equipment for the new study to be conducted in a 
different laboratory may differ from that used in the initial study, resulting in somewhat 
different calibrations for the response measurements. id 

The difficulties in replicating a study are particularly acute in the social sciences where 
controlled experiments often are not feasible. Repetition of an observational study usually 
involves different conditions, the differences being related to changes in setting and/or time. 
For instance, a study investigating the relation between amount of delegation of authority 
by executives in a firm to the age of the executive was repeated in another firm which 
has a somewhat different management philosophy. As another example, a study relating 
consumer purchases of a product to special promotional incentives was repeated in another 
year when the business climate differed substantially from that during the initial study. 

It may be thought that an inability to reproduce a study identically makes the replication 
study useless for validation purposes. This is not the case. No single study is fully useful 
until we know how much the results of the study can be generalized. If areplication study for 
which the conditions of the setting differ only slightly from those of the initial study yields 
substantially different regression results, then we learn that the results of the initial study 
cannot be readily generalized. On the other hand, if the conditions differ substantially and 
the regression results are still similar, we find that the regression results can be generalized to 
apply under substantially varying conditions. Still another possibility is that the regression 
results for the replication study differ substantially from those of the initial study, the 
differences being related to changes in the setting. This information may be useful for 
enriching the regression model by including new explanatory variables that make the model 
more widely applicable. 


Comment 

When the new data are collected under controlled conditions in an experiment, it is desirable to include 
data points of major interest to check out the model predictions. If the model is to be used for making 
predictions over the entire range of the X observations, a possibility is to include data points that are 
uniformly distributed over the X space. — ' L| 


with Theory, Empirical Evidence, or Simulation Results 

In some cases, theory, simulation results, or previous empirical results may be helpful in 
determining whether the selected model is reasonable. Comparisons of regression coeffi- 
cients and predictions with theoretical expectations, previous empirical results, or simulation 


372 PartTwo Multiple Linear Regression 


results should be made. Unfortunately, there is often little theory that can be used to validate 
regression models. 


Data Splitting 

By far the preferred method to validate a regression model is through the collection of new 
data. Often, however, this is neither practical nor feasible. An alternative when the data se 
is large enough is to split the data into two sets. The first set, called the model-building set or 
the training sample, is used to develop the model. The second data set, called the validation 
or prediction set, is used to evaluate the reasonableness and predictive ability of the selected 
model. This validation procedure is often called cross-validation. Data splitting in effect iş 
an attempt to simulate replication of the study. 

The validation data set is used for validation in the same way as when new data are 
collected. The regression coefficients can be reestimated for the selected model and then 
compared for consistency with the coefficients obtained from the model-building data set 
Also, predictions can be made for the data in the validation data set from the regression 
model developed from the model-building data set«to calibrate the predictive ability of this 
regression model for the new data. When the calibration data set is large enough, one can 
also study how the “good” models considered in the model selection phase fare with the 
new data. 

Data sets are often split equally into model-building and validation data sets. It is impor. 
tant, however, that the model-building data set be sufficiently large so that a reliable model 
can be developed. Recall in this connection that the number of cases should be at least 6 to 
10 times the number of variables in the pool of predictor variables. Thus, when 10 variables 
are in the pool, the model-building data set should contain at least 60 to 100 cases. If the 
entire data set is not large enough under these circumstances for making an equal split, the 
validation data set will need to be smaller than the model-building data set. 

Splits of the data can be made at random. Another possibility is to match cases in pairs 
and place one of each pair into one of the two split data sets. When data are collected 
sequentially in time, it is often useful to pick a point in time to divide the data. Generally, 
the earlier data are selected for the model-building set and the later data for the validation 
set. When seasonal or cyclical effects are present in the data (e.g., sales data), the split 
should be made at a point where the cycles are balanced. 

Use of time or some other characteristic of the data to split the data set provides the 
opportunity to test the generalizability of the model since conditions may differ for the two 
data sets. Data in the validation set may have been created under different causal conditions 
than those of the model-building set. In some cases, data in the validation set may represent 
extrapolations with respect to the data in the model-building set (e.g., sales data collected 
over time may contain a strong trend component). Such differential conditions may leadto 
a lack of validity of the model based on the model-building data set and indicate a need to 
broaden the regression model so that it is applicable under a broader scope of conditions. 

A possible drawback of data splitting is that the variances of the estimated regression 
coefficients developed from the model-building data set will usually be larger than those 
that would have been obtained from the fit to the entire data set. If the model-building data 
set is reasonably large, however, these variances generally will not be that much larger than 
those for the entire data set. In any case, once the model has been validated, it is customary 
practice to use the entire data set for estimating the final regression model. 


pample - 
ample - 


Ed 


Chapter9 Building the Regression Model I: Model Selection and Validation 373 


In the surgical unit example, three models were favored by the various model-selection 
criteria. The SBC, and PRESS, criteria favored the four-predictor model: 


Y; = Po + В0Ха + ВХ + 3Х;з + PaXis + ғ: Modeli (9.21) 
Cp was minimized by the five-predictor model: 

Y; = fo + В.Ха + ВХ + BsXis + BsXis + PaXis + & Model2 (9.22) 
while the R2 „ and AIC, criteria were optimized by the six-predictor model: 

Y; = Bo + BiXin + В2Х + BaXis + PsXis + BeXie + Ва Хва Model3 (9.23) 


We wish to assess the validity of these three models, both internally and externally. 

Some evidence of the internal validity of these fitted models can be obtained through 
an examination of the various model-selection criteria. Table 9.4 summarizes the fits of 
the three candidate models to the original (training) data set in columns (1), (3), бапа (5). 
We first consider the SSE,, PRESS, and C, criterion values. Recall that the PRESS, value 
is always larger than SSE, because the regression fit for the ith case when this case is 
deleted in fitting can never be as good as that when the ith case is included. A PRESS, 


TABLE 9.4 Regression Results for Candidate Models (9.21), (9.22), and (9.23) Based on Model-Building and 


Validation Data Sets—Surgical Unit Example. 


(0 2) (3) (4) (5) (6) 
Model 1 Model 1 Model 2 ‚Моде! 2/ Model 3. Model 3 
Training Validation. Training Validation, Training Validation 
Data Set Data Set. -Data Set Data Set Data Set Data Set 

5 5 6 6 7 7 

3:8524 3.6350 3.8671 3.6143 4.0540: 3.4699 
0:1927 0.2894 0.1906 0.2907 0.2348 70.3468 
0.0733 0.0958. 0.0712 0.0999 0.0715. -0.0987 
0.0190 0.0319 0.0188 0.0323 0.0186. 0.0325 
0.0142 — 0.0164. 0.0139 0:0159 0.0138. :0.0162 
:0.0017 0.0023 0.0017: 0:0024 0.0017 0.0024 
0.0155 0.0156 -0:0151 0.0154 0.0151 0:0156 
0.0014 0.0020 0.0014 0.0020 0.0014 0.0021 

— — — : = —0.0035 0.0025 

- НЕЕ 25 Е 0.0026: 0.0033 

E — 0.0869. 10.0731 0.0727 - 

= — 0.0582. 0.0792 0.0795 

0.1860 0,3627 * :0.1886 0.1931 
0.0964 0.0765 0:0966 0.0972 
3.7951 2.0820. 3.7288 3.6822 
4:5219 2.7827 4.6536 4.8981 
6.2094 5.5406 7.3331 8.7166 
0.0775 0.0434, 0.0777 0.0783 
— 0.0764 — — 
0:6824 0.8205 “0.6815 0.6787 


374 .RartBwo Multiple Linear Regression 


TABLE 9.5 Potential Predictor Variables and Response Variable—Surgical Unit Example. 


Case 
Number 
i 

55 
56 
57 
106 
107 
108 


Blood- 


Clotting Prognostic Enzyme Liver 


Alc. 
Use: 


Alc. 
Use: 


Survival 


Score Index Test Test Age Gender Mod. Heavy Time 
Xn Xi Xis Ха Xis Xie Xi; Xie Y; = In Y, 
7.1 23 78 1.93 45 0 1 0 302 5.710 
4.9 66 91 3.05 34 1 0 0 767 6.642 
6.4 90 35 106 39 1 0 1 487 Gigs 
6.9 90 33 278 48 1 0 0 655 648$ 
7.9 45 55 2.46 43 0 1 0 377 5.932 
4.5 68 60 2.07 59 0 0 O -2642 6.465 


value reasonably close to SSE, supports the validity of the fitted regression model and of 
MSE, as an indicator of the pitedicive capability of this model. In this case, all three of the 

candidate models have PRESS, values that are reasonably close to SSE,. For example, for 
Model І, PRESS, = 2.7378 nd SSE, — 2.1788. Recall also that if C, ~ p, this suggests that 
there is little or no bias in the regression model. This is the case for the three models under 
consideration. The Cs, Co, and C; values for the three models are, respectively, 5.7508, 
5.5406, and 5.7874. 

To validate the selected regression model externally, 54 additional cases had been held 
out for a validation data set. A portion of the data for these cases is shown in Table 9.5, The 
correlation matrix for these new data (not shown) is quite similar to the one in Figure 9.3 for 
the model-building data set. The estimated regression coefficients, their estimated standard 
deviations, and various model-selection criteria when regression models (9.21), (9.22), and 
(9.23) are fitted to the validation data set are shown in Table 9.4, columns 2, 4, and 6. 
Note the excellent agreement between the two sets of estimated regression coefficients, and 
the two sets of regression coefficient standard errors. For example, for Model | fit to the 
training data, ру = .0733; when fit to the validation data, we obtain b; = .0958. In view 
of the magnitude of the corresponding standard errors (.0190 and .0319), these values are 
reasonably close. 

A review of Table 9.4 shows that most of the estimated coefficients agree quite closely. 
However, it is noteworthy that bs in Model 3—the coefficient of age—is negative for the 
training data (bs = —0.0035), and positive for the validation data (bs = 0.0025). This is 
certainly a cause for concern, and it raises doubts about the validity of Model 3. 

To calibrate the predictive ability of the regression models fitted from the training data 
set, the mean squared prediction errors MSPR in (9.20) were calculated for the 54 cases in 
the validation data set in Table 9.5 for each of the three candidate models; they are .0773, 
.0764, and .0794, respectively. The mean squared prediction error generally will be larger 
than MSE, based on the training data set because entirely new data are involved in the 
validation data set. In this case, the relevant MSE, values for the three models are .0445, 
.0434, and .0427. The fact that MSPR here does not differ too greatly from MSE, implies 
that the error mean square MSE, based on the training data set is a reasonably valid indicator 
of the predictive ability of the fitted regression model. The closeness of the three MSPR 


Cited 
References 


Chapter 9 Building the Regression Model I: Model Selection and Validation 375 


values suggest that the three candidate models perform comparably in terms of predictive 
accuracy. 

As a consequence of the concerns noted earlier about Model3, this model was eliminated 
from further consideration. The final selection was based on the principle of parsimony. 
While Models 1 and 2 performed comparably in the validation study, Model 1 achieves this 
level of performance with one fewer parameter. For this reason, Model 1 was ultimately 
chosen by the investigator as the final model. 


Comments 


1. Algorithms are available to split data so that the two data sets have similar statistical properties. 
The reader is referred to Reference 9.11 for a discussion of this and other і issues associated with 
validation of regression models. i 


2. Refinements of data splitting have been proposed. With the double cross-validation procedure, 
for example, the model is built for each half of the split data and then tested on the other half of 
the data. Thus, two measures of consistency and predictive ability are obtained from the vo fitted 
models. For smaller data sets, a procedure called K-fold cross-validation is often used. With this 
procedure, the data are first split into K roughly equal parts. For k = 1, 2, ..., K, we use the kth part 
as the validation set, fit the model using the other k — 1 parts, and obtain the predicted sum of squares 
for error. The K estimates of prediction error are then combined to produce a K-fold cross-validation 
estimate. Note that when К =n, the K-fold cross-validation estimate is the identical to the PRESS, 
statistic. 

3. For small data sets where data splitting is impractical, the PRESS criterion in (9.17), considered 
earlier for use in subset selection, can be employed as a form of data splitting to assess the precision 
of model predictions. Recall that with this procedure, each data point is predicted from the least 
squares fitted regression function developed from the remaining n — 1 data points. A fairly close 
agreement between PRESS and SSE suggests that MSE may be a reasonably valid indicator of the 
selected model's predictive capability. Variations of PRESS for validation have also been proposed, 
whereby т cases are held out for validation and the remaining n — т cases are used to fit the 
model. Reference 9.11 discusses these procedures, as well as issues dealing with optimal splitting of 
data sets. 

4. When regression models built on observational data do not predict well outside the range of 
the X observations in the data set, the usual reason is the existence of multicollinearity among the 
X variables. Chapter 11 introduces possible solutions for this difficulty including ridge regression or 
other biased estimation techniques. 

5. Ifadata set foran exploratory observational study is very large, it can be divided into three parts. 
The first part is used for model training, the second part for cross-validation and model selection, and 
the third part for testing and calibrating the final model (Reference 9.10). This approach avoids any 
bias resulting from estimating the regression parameters from the same data set used for developing 
the model. A disadvantage of this procedure is that the parameter estimates are derived from a smaller 
data set and hence are more imprecise than if the original data set were divided into two parts for 
model building and validation. Consequently, the division of a data set into three parts is used in 
practice only when the available data set is very large. Ei 


9.1. Daniel, C., and E. S. Wood. Fitting Equations to Data: Computer Analysis of Multifactor Data. 
2nd ed. New York: John Wiley & Sons, 1999. 

9.2. Freedman, D. A. “A Note on Screening Regression Equations,” The American Statistician 37 
(1983), рр. 152-55. 


376 PartTwo Multiple Linear Regression 


9.3. 


9.4. 


9.5. 
9.6. 


9.7. 


9.8. 


9.9, 


Pope. P. T., and J. T. Webster. “The Use of an F-Statistic in Stepwise Regression: 
14 (1972). pp. 327-40. 

Mantel, N. "Why Stepdown Procedures in Variable Selection,” fechnomeirics 12 (1970, 
pp. 621-25. ; 
Miller, A. J. Subsei Selection in Regression. 2nd ed. London: Chapman and Hall, 2002. 
Faraway, J.J. “On the Cost of Data Analysis.” Journal of Computational and Graphical Statistics 
1 (1992), pp. 213-29. 

Breiman, L.. and P. Spector. "Submodcl Selection and Evaluation in Regression. The X -Random 
Case,” {nteruational Statistical Review 60 (1992), pp. 291 -319. 

Lindsay, R. M., and A. S. C. Ehrenberg. “The Design of Replicated Studies.” The American 
Statistician 47 (1993), pp. 217—28. 

Snee, R. D. “Validation of Regression Models: Methods and Examples," Technonetries 19 
(1977), pp. 415-28. a 


Technometrigg 


. Hastie, T., Tibshirani, R., and J. Friedman. The Elements of Statistical Learning: Data Mining 


Inference, and Prediction. New York: Springer-Verlag, 2001. 


. Stone. M. “Cross-validatory Choice and Assessment of Statistical Prediction,” Journal of the 


Royal Statistical Society В 36 (1974), pp. 111-47. 


Problems 


9.1. 


9.4. 


9.5. 


9.6. 


9.7. 


9.8. 


*9.9, 


A speaker stated: "In well-designed experiments involving quantitative explanatory variables, 
a procedure for reducing the number of explanatory variables after the data are obtained is not 
necessary.” Discuss. 


The dean of a graduate school wishes to predict the grade point average in graduate work for 
recent applicants. List a dozen variables that might be useful explanatory variables here, 


. Two researchers. investigating factors affecting summer attendance at privately operate 
g Р y Oper 


beaches on Lake Ontario. collected information on attendance and 1 | explanatory variables for 
42 beaches. Two summers were studied, of relatively hot and relatively cool weather. respec- 
tively. A “best” subsets algorithm now is to be used to reduce the number of explanatory 
variables for the final regression model. 


a. Should the variables reduction be done for both summers combined, or should it be done 
separately for each summer? Explain the problems involved and how you might handle 
them. 

b. Will the “best” subsets selection procedure choose those explanatory variables that are most 
important in a causal sense for determining beach attendance? 

In forward stepwise regression, what advantage is there in using a relatively small a-to-enter 

value for adding variables? What advantage is there in using a larger o-to-enter value? 

In forward stepwise regression, why should the a-10-enter value for adding variables never 

exceed the w-to-remove value for deleting variables? 

Prepare a flowchart of each of the following selection methods: (1) forward stepwise regression, 

(2) forward selection, (3) backward elimination. 

An engineer has stated: “Reduction of the number of explanatory variables should always be 

done using the objective forward stepwise regression procedure.” Discuss. 

An attendee at a regression modeling short course stated: “I rarely see validation of regression 

models mentioned in published papers, so it must really not be an important component of 

model building" Comment. 

Refer to Patient satisfaction Problem 6. 15. The hospital administrator wishes to determine the 

best subset o! predictor variables for predicting patient satisfaction. 


*9.10. 


*9.11. 


9.12. 


9.13. 


Chapter 9 Building the Regression Model I: Model Selection and Validation 377 


a. Indicate which subset of predictor variables you would recommend as best for predicting 
patient satisfaction according to each of the following criteria: (1) RZ p (2) AICp, (3) Cp, 
(4) PRESS,. Support your recommendations with appropriate graphs. 

b. Do the four criteria in part (а) identify the same best subset? Does this always happen? 

c. Would forward stepwise regression have any advantages here as à screening procedure over 
the all-possible-regressions procedure? 


Job proficiency. A personnel officer in a governmental agency administered four newly de- 
veloped aptitude tests to each of 25 applicants for entry-level clerical positions in the agency. 
For purpose of the study, all 25 applicants were accepted for positions irrespective of their test 
scores. After a probationary period, each applicant was rated for proficiency on the job. The 
scores on the four tests (X,, X2, Хз, X4) and the job proficiency score (Y ) for the 25 employees 


were as follows: y 
? Job Proficiency 
Subject Test Score Score 

i Xn Хр Хз Хм Ү, i 

1 86 110 100 ^ 87 88 

2 62 97 99 100 80 

3 110 107 103 103 96 

23 104 73 93 80 78 

24 94 121 115 104 115 

25 91 129 97 83 83 


a. Prepare separate stem-and-leaf plots of the test scores for each of the four newly developed 
aptitude tests. Are there any noteworthy features in these plots? Comment. 

b. Obtain the scatter plot matrix. Also obtain the correlation matrix of the X variables. What do 
the scatter plots suggest about the nature of the functional relationship between the response 
variable Y and each of the predictor variables? Are any serious multicollinearity problems 
evident? Explain. 

c. Fit the multiple regression function containing all four predictor variables as first-order 
terms. Does it appear that all predictor variables should be retained? 


Refer to Job proficiency Problem 9.10. 


a. Using only first-order terms for the predictor variables in the pool of potential X variables, 
find the four best subset regression models according to the Ке , criterion. 

b. Since there is relatively little difference in R2 p for the four best subset models, what other 
criteria would you use to help in the selection of the best model? Discuss. 


Refer to Market share data set in Appendix C.3 and Problem 8.42. 


a. Using only first-order terms for predictor variables, find the three best subset regression 
models according to the SBC, criterion. 


b. Is your finding here in-agreement with what you found in Problem 8.42 (b) and (c)? 


Lung pressure. Increased arterial blood pressure in the lungs frequently leads to the devel- 
opment of heart failure in patients with chronic obstructive pulmonary disease (COPD). The 
Standard method for determining arterial lung pressure is invasive, technically difficult, and 
involves some risk to the patient. Radionuclide imaging is a noninvasive, less risky method for 
estimating arterial pressure in the lungs. To investigate the predictive ability of this method, a 
cardiologist collected data on 19 mild-to-moderate COPD patients. The data that follow on the 
next page include the invasive measure of systolic pulmonary arterial pressure (Y) and three 


378 Part Two 


Multiple Linear Regression 


potential noninvasive predictor variables. Two were obtained by using radionuclide imaging 
emptying rate of blood into the pumping chamber of the heart (X ,) and ejcction rate of blood 
pumped out of the heart into the lungs ( X»)—4and the third predictor variable measures à blood 
gas ( X3). 


а. Prepare separate dot plots for each of the three predictor variables. Are there 


any Noteworthy 
features in these plots? Comment. 


b. Obtain the scatter plot matrix. Also obtain the corrclation matrix of the X variables, What do 
the scatter plots suggest about the nature of the functional relationship between Y ang each 
of the predictor variables? Are any serious multicollinearity problems evident? Explain 


c. Fit the multiple regression function containing the three predictor variables as first-order 
terms. Does it appear that all predictor variables should be retained? 


Subject 
i Ха Xia Хз Y; 
1 45 36 45 49 
2 30 28 40 55 
3 11 16 42 85 
17 27 51 44 29 
18 37 32 54 40 
19 34 40 36 31 


Adapted from A. T. Marmor сї al., "Improved Radionuclide 
Metbod for Assessmeni of Pulmonary Anery Pressure 
in COPD.” Chest 89 { 1986). рр. 64-69. 


9.14. Refer to Lung pressure Problem 9.13. 


a. Using first-order and second-order terms for each of the three predictor variables (centered 
around the mean) in the pool of potential X variables (including cross products of the first- 
order terms), find the three best hierarchical subset regression models according to the Ri 
criterion. 

b. 15 there much difference in Rọ , for the three best subset models? 


9.15. Kidney function. Creatinine clearance (У) is an important measure of kidney function, but is 
difficult to obtain in a clinical office setting because it requires 24-hour urine collection. To 
determine whether this measure can be predicted from some data that are easily available, а 
kidney specialist obtained the data that follow for 33 male subjects. The predictor variables are 
serum creatinine concentration ( X), age (Хэ), and weight (X). 


Subject 
j Xn Xi Xn Y; 
1 -71 38 71 132 
2 1.48 78 69 53 
3 2.21 69 85 50 
31 1.53 70 75 52 
32 1.58 63 62 73 
33 1.37 68 52 57 


Adapted from W. J. Sbib and S. Weisberg, "Assessing Influence 
ap Multiple Linear Regression with Incomplete Data.” 
Jechnometrics 28 (1986). рр. 234-40 


9.16. 


*9,17. 


*9,18. 


9.19. 


9.20. 


Chapter 9 Building the Regression Model I: Model Selection and Validation 379 


a. Prepare separate dot plots for each of the three predictor variables. Are there any noteworthy 
features in these plots? Comment. 

b. Obtain the scatter plot matrix. Also obtain the correlation matrix of the X variables. What do 
the scatter plots suggest about the nature of the functional relationship between the response 
variable Y and each predictor variable? Discuss. Are any serious multicollinearity problems 
evident? Explain. 

с. Fit the multiple regression function containing the three predictor variables as first-order 
terms. Does it appear that all predictor variables should be retained? 

Refer to Kidney function Problem 9.15. 

a. Using first-order and second-order terms for each of the three predictor variables (centered 
around the mean) in the pool of potential X variables (including cross products of the first- 
order terms), find the three best hierarchical subset regression models according to the Cp 
criterion. ii 

b, Is there much difference in C, for the three best subset models? і 

Refer to Patient satisfaction Problems 6.15 and 9.9. The Hospital administrator was interested 

tolearn how the forward stepwise selectíon procedure and some of its variations would perform 

here. 

a. Determine the subset of variables that is selected as best by the forward stepwise regression 
procedure, using F limits of 3.0 and 2.9 to add or delete a variable, respectively. Show your 
steps. 

b. To what level of significance in any individual test is the F limit of 3.0 for adding a variable 
approximately equivalent here? 

c, Determine the subset of variables that is selected as best by the forward selection procedure, 
using an F limit of 3.0 to add a variable. Show your steps. 

d. Determine the subset of variables that is selected as best by the backward elimination 
procedure, using an F limit of 2.9 to delete a variable. Show your steps. 

e. Compare the results of the three selection procedures. How consistent are these results? 
How do the results compare with those for all possible regressions in Problem 9.9? 

Refer to Job proficiency Problems 9.10 and 9.11. 

a. Using forward stepwise regression, find the best subset of predictor variables to predict job 
proficiency. Use o limits of .05 and .10 for adding or deleting a variable, respectively. 

b. How does the best subset according to forward stepwise regression compare with the best 
subset according to ће R7 , criterion obtained ín Problem 9.11a? 

Refer to Kidney function Problems 9,15 and 9.16. 

a. Using the same pool of potential X variables as in Problem 9.162, find the best subset of 
variables according to forward stepwise regression with a limits of .10 and .15 to add or 
delete a variable, respectively. 

b. How does the best subset accordíng to forward stepwise regression compare with the best 
subset according to the R7 , criterion obtained in Problem 9.16a? 

Refer to Market share data set in Appendix C.3 and Problems 8.42 and 9.12. 

a. Using forward stepwise regression, find the best subset of:predíctor variables to predict 
market share of their product. Use o limits of .10 and .15 for adding or deleting a predictor, 
respectively. 

b. How does the best subset according to forward stepwise regression compare with the best 
subset according to the SBC, criterion used in 9,1247? 


380 PartTwo Multiple Linear Regression 


*9,2]. 


*9,22. 


Refer 10 Job proficiency Problems 9.10 and 9.18. To assess inicrnally the predictive ability of 
the regression mode! identified in Problem 9.18, compute the PRESS statistic and compare it 
to SSE. What docs this comparison suggest about the validity of MSE as an indicator of the 
predictive ability of the fitted model? 

Refer to Job proficiency Problems 9.10 and 9.18. To assess externally the validity of the 
regression model identified in Problem 9.18. 25 additional applicants for entry-level cleric al 
positions in the agency were similarly tested and hired irrespective of their test scores, The data 
follow. 


lob Proficiency 


Subject Test Score Score 
i Xn Хә Хз X14 >}; 
26 65 109 88 84 58 
27 85 90 104 98 92 
28 93 73 91 82 71 
48 115 119 102 « 94 95 
49 129 70 94 95 81 
50 136 104 106 104 109 


a. Obtain ihe correlation matrix of the X variables for the validation data set and compare it 
with that obtained in Problem 9.! 0b for the model-building data set. Are the two correlation 
matrices reasonably similar? 

b. Fit the regression model identified in Problem 9.1 8а to the validation data set. Compare the 
estimated regression coefficients and their estimated standard deviations to those obtained 
in Problem 9.18a. Also compare the error mean squares and coefficients of multiple de- 
termination. Do the estimates for the validation data set appear to be reasonably similar to 
those obtained for the model-building data set? 

c. Calculate the mean squared prediction error in (9.20) and compare it to M SE obtained from 
the model-building data set. Is there evidence of a substantial bias problem in MSE here? Is 
this conclusion consistent with your finding in Problem 9.21? Discuss. 

d. Combine the model-building data sei in Problem 9.10 with the validation data set and fit the 
selected regression model to the combined data. Are the estimated standard deviations of 
the estimated regression coefficients appreciably reduced now from those obtained for the 
model-building data set? 


3. Referto Lung pressure Problems 9.13 and 9.14. The validity of the regression model identified 


as best in Problem 9.14a is to be assessed internally. 

à. Calculate the PRESS statistic and compare it to SSE. What does this comparison suggest 
about the validity of MSE as an indicator of the predictive ability of the fitted model? 

b. Case 8 alone accounts for approximately one-half of the entire PRESS statistic. Would you 
recommend modification of the model because of the strong impact of this case? What are 
some corrective action options that would lessen the effect of case 8? Discuss. 


Exercise 


9.24 The true quadratic regression function is E(Y ] = 15 + 20X + 3X?. The fitted linear regression 


function is Ў = 13 + 40X. for which E {b} = |Oand E {b} = 45. What are the bias and sampling 
error components of the mean squared error for X; — 10 and for X; — 20? 


Projects 


Chapter 9 Building the Regression Model I: Model Selection and Validation 381 


9.25. 


9.26. 


9.27. 


9.28. 


Refer to the SENIC data set in Appendix C.1. Length of stay (Y) is to be predicted, and the 
pool of potential predictor variables includes all other variables in the data set except medical 
school affiliation and region. It is believed that a model with log), Y as the response variable 
and the predictor variables in first-order terms with no interaction terms will be appropriate. 
Consider cases 57-113 to constitute the model-building data set to be used for the following 
analyses. 


а. Prepare separate dot plots for each of the predictor variables. Are there any noteworthy 
features in these plots? Comment. 

b. Obtain the scatter plot matrix. Also obtain the correlation matrix of the X variables. Is there 
evidence of strong linear pairwise associations among the predictor variables here? 

c. Obtain the three best subsets according to the C, criterion, Whichrof these subset models 
appears to have the smallest bias? 4 


Refer to the CDI data set in Appendix C.2. A public safety official wishes to predict the rate of 
serious crimes in а CDI (Y, total number of serious crimes per 100,000 population). Fhe pool 
of potential predictor variables includes all other variables in the data set except total population, 
total serious crimes, county, state, and region. It is believed that a model with predictor variables 
in first-order terms with no interaction terms will be appropriate. Consider the even-numbered 
cases to constitute the model-building data set to be used for the following analyses. 


a. Prepare separate stem-and-leaf plots for each of the predictor variables. Are there any 
noteworthy features in these plots? Comment. 

b. Obtain the scatter plot matrix. Also obtain the correlation matrix of the X variables. Is there 
evidence of strong linear pairwise associations among the predictor variables here? 

c, Using the SBC, criterion, obtain the three best subsets. 


Refer tothe SENIC data set in Appendix C.1 and Project 9.25. The regression model identified 
as best in Project 9.25 is to be validated by means of the validation data set consisting of 
cases 1—56. 


a, Fit theregression model identified in Project 9.25 as best tothe validation data set. Compare 
the estimated regression coefficients and their estimated standard deviations with those ob- 
tained in Project 9.25. Also compare the error mean squares and coefficients of multiple 
determination, Does the model fitted to the validation data set yield similar estimates as the 
model fitted to the model-building data set? 

b. Calculate the mean squared prediction error in (9.20) and compare it to MSE obtained from 
the model-building data set. Is there evidence of a substantial bias problem in MSE here? 

c. Combine the model-building and validation data sets and fit the selected regression mode] 
to the combined data. Are the estimated regression coefficients and their estimated standard 
deviations appreciably different from those for the model-building data set? Should you 
expect any differences in the estimates? Explain. í 


Refer to the CDI data set in Appendix C.2 and Project 9.26. The regression model identified 
as best in Project 9.26c is tobe validated by means of the validation data set consisting of the 
odd-numbered СО. А 


а. Fit е regression model identified in Project 9.26 as best to the validation data set. Com- 
pare the estimated regression coefficients and their estimated standard deviations with those 
obtained in Project 9.26c. Also compare the error mean squares and coefficients of multi- 
ple determination. Does the model fitted to the validation data set yield similar estimates as 
the model fitted to the model-building data set? 


382 PartTwo Multiple Linear Regression 


b. Calculate the mean squared prediction error in (9.20) and compare it to MSE obtained from 
the model-building data set. Is there evidence of a substantial bias problem in MSE here? 

c. Fit the selected regression model to the combined model-building and validation data Sets 
Are the estimated regression coefficients and their estimated standard deviations appreciably 
different from those for the modcl fitted to the model-building data set? Should you expect 
any differences in the estimates? Explain. 


Case 
Studies 


9.29. 


9.30. 


9.31. 


9.33. 


Refer to the Website developer data set in Appendix С.б. Management is interested in de- 
termining what variables have thc grcatest impact on production output in the relcase of new 
customer websites. Data on 13 three-person website development teams consisting of a Project 
manager, a designer. and a developer are provided in the data sei. Production data from January 
2001 through August 2002 include four potential predictors: (1) the change in the website de- 
velopment process, (2) the size of the backlog of orders, (3) the team effect, and (4) the number 
of months experience of each team. Develop a best subset model for predicting production 
output. Justify your choice of model. Assess your fnodel's ability to predict and discuss its use 
as a tool for management decisions. 


Refer to the Prostate cancer daia set in Appendix C.5. Serum prostate-specific antigen (PSA) 
was determined in 97 men with advanced prostate cancer. PSA is a well-established screening 
test for prostate cancer and the oncologists wanted to examine the correlation between level 
of PSA and a number of clinical measures for men who were about to undergo radical prosta- 
tectomy. The measures are cancer volume, prostate weight, patient age, the amount of benign 
prostatic hyperplasia, seminal vesicle invasion, capsular penetration, and Gleason score. Select 
a random sample of 65 observations to use as the model-building data set. Develop a best subset 
model for predicting PSA. Justify your choice of model. Assess your model's ability to predict 
and discuss its usefulness to the oncologists. 

Refer to Real estate sales data set in Appendix C.7. Residential sales that occurred during the 
year 2002 were available from a city in the midwest. Data on 522 arms-length transactions 
include sales price, style, finished square feet, number of bedrooms, pool, lot size, year built, 
air conditioning, and whether or not the lor is adjacent to a highway. The city tax assessor was 
interested in predicting sales price based on the demographic variable information given above. 
Select a random sample of 300 observations to use in the model-building data set. Develop a 
best subset model for predicting sales price. Justify your choice of model. Assess your model's 
ability to predict and discuss its use as a tool for predicting sales price. 


. Refer to Prostate cancer Case Study 9.30. The regression model identified in Case Study 9.30 


is to be validated by means of the validation data set consisting of those cases not selected for 
the model-building data set. 


a. Еп the regression model identified in Case Study 9.30 to the validation data set. Compare 
the estimated regression coefficients and their estimated standard errors with those obtained 
in Case Study 9.30. Also compare the error mean square and coefficients of multiple de- 
termination. Does the model fitted to the validation data set yield similar estimates as the 
model fitted to the model-building data set? 

b. Calculate the mean squared prediction error (9.20) and compare it to MSE obtained from 
the model-building data set. Is there evidence of a substantial bias problem in MSE here? 


Refer to Real estate sales Case Study 9.31. The regression model identified in Case Study 9.3! 
is to be validated by means of the validation data set consisting of those cases not selected for 
the model building data set. 


Chapter 9 Building the Regression Model I: Model Selection and Validation 383 


a. Fit the regression model identified in Case Study 9.31 to the validation data set. Compare 
the estimated regression coefficients and their estimated standard errors with those obtained 
in Case Study 9.31. Also compare the error mean square and coefficients of multiple de- 
termination. Does the model fitted to the validation data set yield similar estimates as the 
model fitted to the model-building data set? 


b. Calculate the mean squared prediction error (9.20) and compare it to MSE obtained from 
the model-building data set. Is there evidence of a substantial bias problem in MSE here? 


Chapter 


10.1 


384 


Building the Regression 
Model II: Diagnostics 


In this chapter we take up a number of refined diagnostics for checking the adequacy of 
a regression model. These include methods for detecting improper functional form for а 
predictor variable, outliers, influential observations, and multicollinearity. We conclude the 
chapter by illustrating the use of these diagnostic procedures in the surgical unit example, 
In the following chapter, we take up some remedial measures that are useful when the 
diagnostic procedures indicate model inadequacies. 


Model Adequacy for a Predictor 
Variable—Added- Variable Plots 


We discussed in Chapters 3 and 6 how a plot of residuals against a predictor variable in 
the regression model can be used to check whether a curvature effect for that variable is 
required in the model. We also described the plotting of residuals against predictor variables 
not yet in the regression model to determine whether it would be helpful to add one or more 
of these variables to the model. 

A limitation of these residual plots is that they may not properly show the nature of the 
marginal effect of a predictor variable, given the other predictor variables in the model. 
Added-variable plots, also called partial regression plots and adjusted variable plots, are 
refined residual plots that provide graphic information about the marginal importance ofa 
predictor variable X,. given the other predictor variables already in the model. In addition, 
these plots can at times be useful for identifying the nature of the marginal relation fora 
predictor variable in the regression model. 

Added-variable plots consider the marginal role of a predictor variable X, , given that the 
other predictor variables under consideration are already in the model. In an added-variable 
plot, both the response variable Y and the predictor variable X, under consideration are re- 
gressed against the other predictor variables in the regression model and the residuals are 
obtáined for each. These residuals reflect the part of each variable that is not linearly asso- 
ciated with the other predictor variables already in the regression model. The plot of these 
residuals against each other (1) shows the marginal importance of this variable in reducing 
the residual variability and (2) may provide information about the nature of the marginal 


Chapter 10 Building the Regression Model II: Diagnostics 385 


FIGURE 10.1 e(Y|X2) e(Y|X2) e(Y|X2) 
Prototype 
Added- 
Variable 
Plots. 


0 e(Xi|X2) 0  e(X|X2) 0  e(X4|X2) 
(a) (b) © А 


regression relation for the predictor variable X, under consideration for possible inclusion í 
in the regression model. -z 

To make these ideas more specific, we consider a first-order multiple regression model 
with two predictor variables X, and X5. The extension to more than two predictor variables is 
direct. Suppose we are concerned about the nature of the regression effect for Ху, given that 
Хз is already in the model. We regress Y on Xz and obtain the fitted values and residuals: 


Y;(X2) = bo + b2Xn (10.1a) 
e;(Y|X2) = Y; — ¥;(X2) (10.1b) 


The notation here indicates explicitly the response and predictor variables in the fitted 
model. We also regress X, on X» and obtain: 


À,(X2) = b3 + b2Xp (10.2a) 
ei (XilX2) = Ха — Xn(X2) (10.2b) 


The added-variable plot for predictor variable X; consists of a plot of the Y residuals e(Y | X2) 
against the X, residuals e(X,|X2). 

Figure 10.1 contains several prototype added-variable plots for our example, where X5 
is already in the regression model and X, is under consideration to be added. Figure 10.1a 
shows a horizontal band, indicating that X; contains no additional information useful for 
predicting Y beyond that contained in X5, so that it is not helpful to add X, to the regression 
model here. 

Figure 10.1b shows a linear band with a nonzero slope. This plot indicates that a linear 
term in X, may be a helpful addition to the regression model already containing X». It 
can be shown that the slope of the least squares line through the origin fitted to the plotted 
residuals is Р, the regression coefficient of X, if this variable were added to the regression 
model already containing X2. | 

Figure 10.1с shows a curvilinear band, indicating that the addition of X, to the regression 
model may be helpful and suggesting the possible nature of the curvatureeffect by the pattern 
shown. А 

Added-variable plots, in addition to providing information about the possible nature of 
the marginal relationship for a predictor variable, given the other predictor variables already 
in the regression model, also provide information about the strength of this relationship. To 
see how this additionalinformation is provided, consider Figure 10.2. Figure 10.2a illustrates 


386 Part Two Multiple Linear Regression 


FIGURE 10.2 Illustration of Deviations in an Added-Variable Plot. 
(a) Deviations around Zero Line (b) Deviations around Line with Slope b, 
SSEQG) = SLe(¥IXi2) SSEQG, X5) = У[е(УХд, X; 


е(Ү|Х,) e(Y|X2) 


e(Y|X2) e(Y Js, X2) 


1 
е(Ху|Х,›) 0 е(Х\|Х›) * 


Or 


an added-variable plot for X; when X» is already in the model, based on n = 3 cases, The 
vertical deviations of the plotted points around the horizontal line e(Y|X5) = 0 shown in 
Figure 10.2a represent the Y residuals when X» alone is in the regression model. When 
these deviations are squared and summed, we obtain the error sum of squares SSE(X,), 
Figure !0.2b shows the same plotted points, but here the vertical deviations of these points 
are around the least squares line through the origin with slope bı. These deviations are the 
residuals e(Y|X;, Хэ) when both X, and X» are in the regression model. Hence, the sum 
of the squares of these deviations is the error sum of squares SSE(X 1, X2). 

The difference between the two sums of squared deviations in Figures 10.2a and 102b 
according to (7.1а) is the extra sum of squares 55А(Х || Хз). Hence, the difference in the 
magnitudes of the two sets of deviations provides information about the marginal strength 
of the linear relation of X, to the response variable, given that X» is in the model. If the 
scatter of the points around the line through the origin with slope b, is much less than the 
scatter around the horizontal line, inclusion of the variable X, in the regression model will 
provide a substantial further reduction in the error sum of squares. 

Added-variable plots are also useful for uncovering outlying data points that may havea 
strong influence in estimating the relationship of the predictor variable X, to the response 
variable, given the other predictor variables already in the model. 


Table 10.1 shows a portion of the data on average annual income of managers during the past 
two years (Х|), à score measuring each manager's risk aversion ( X»), and the amount of life 
insurance carried (Y) for a sample of 18 managers in the 30—39 age group. Risk aversion 
was measured by a standard questionnaire administered to each manager: the higher the 
score, the greater the degree of risk aversion. Income and risk aversion are mildly correlated 
here, the coefficient of correlation being rj? = .254. 

A fit of the first-order regression model yields: 


Example 1 


Y = —205.72 + 6.2880Х + 4.738X» (10.3) 


The residuals for this fitted model are plotted against X, in Figure 10.3a, This residual 
«plot clearly suggests that a linear relation for X, is not appropriate in the model already 
containing X». To obtain more information about the nature of this relationship. we shall 
use an added-variable plot. We regress Y and X, each against X2. When doing this, we 


Chapter 10 Building the Regression Model II: Diagnostics 387 


TABLE 10.1 AverageAnnudl ^ Amiount of Life 
Basic — "M .  Wicome. 'Risk Aversion "Insurance Carried 
Data—Life Manager (thousand dollars)" Score (thousand dollars) 
Load i Xii Xix : ү; 
a 1 45.010 6 91 
2 57.204 4 162 
3 26:852 5 iT 

16 46.130 4 91 

17 30.366 3 E 

18 39.060 5 363 

L 
FIGURE 10.3 Residual Plot and Added-Variable Plot—Life Insurance Example. 
| (a) Residual Plot against X; (b) Added-Variable Plot for X4 
250 ө 
3 > 
& w 
—25 0 25 50 
e(X,|X2) 
obtain: 
Y (X5) = 50.70 + 15.54X (10.42) 
X,(X5) = 40.779 + 1.718X; (10.4b) 


The residuals from these two fitted models are plotted against each other in the added- 
variable plot in Figure 10.3b. This plot also contains the least squares line through the 
origin, which has slope b; = 6.2880. The added-variabfe plot suggests that the curvilinear 
relation between Y and X, when X; is already in the regression medel is strongly positive, 
and that a slight concave upward shape may be present. The suggested concavity of the 
relationship is also evident from the vertical deviations around the line through the origin 
with slope b,. These deviations are positive at the left, negative in the middle, and positive 
again at the right. Overall, the deviations from linearity appear to be modest in the range of 
the predictor variables. 


88 PartTwo Multiple Linear Regression 


Example 2 


Note also that the scatter of the points around the least squares line through the origin With 
slope b, = 6.2880 is much smaller than is the scatter around the horizontal line e(Y|X,) ~ 
indicating that adding X, to the regression model with a linear relation will substantiali : 
reduce the error sum of squares, In fact, the coefficient of partial determination for the linear 
effect of X, is Rp = 984, Incorporating a curvilinear effect for X, will lead to only а 
modest further reduction in the error sum of squares since the plotted points are already 
quite close to the linear relation through the origin with slope by. 

Finally, the added-variable plot in Figure 10.3b shows one outlying case. in the Upper 
right corner. The influence of this case needs to be investigated by procedures to be explained 
later in this chapter. 


For the body fat example in Table 7.1 (page 257), we consider here the regression of body fat 
(Y) only on triceps skinfold thickness (X |) and thigh circumference (X2). We omit the third 
predictor variable (X3, midarm circumference) to focus the discussion of added-variable 
plots on its essentials. Recall that X, and X» are highly correlated (rj. = .92). The fitted 
regression function was obtained in Table 7.2c (page 258): 


f = —19.174 + 2224X, + .6594X> 


Figures 10.4a and 10.4с contain plots of the residuals against X, and X». respectively, 
These plots do not indicate any lack of fit for the linear terms in the regression model or the 
existence of unequal variances of the error terms. 

Figures 10.46 and 10.44 contain the added-variable plots for X, and X», respectively, 
when the other predictor variable is already in the regression model. Both plots also show 
the line through the origin with slope equal to the regression coefficient for the predictor 
variable if it were added to the fitted model. These two plots provide some useful additional 
information. The scatter in Figure 10.4b follows the prototype in Figure I0. la, suggesting 
that X, is of little additional help in the model when Хэ is already present. This information 
is not provided by the regular residual plot in Figure 10.4a. The fact that X, appears to be 
of little marginal help when X» is already in the regression model is in accord with earlier 
findings in Chapter 7. We saw there that the coefficient of partial determination is only 

vip = -031 and that the r* statistic for b, is only .73. 

The added-variable plot for X» in Figure 10.44 follows the prototype in Figure 10.1b, 
showing a linear scatter with positive slope. We also see in Figure 10.44 that there is 
somewhat less variability around the line with slope b» than around the horizontal line 
e(Y|X,) = 0. This suggests that: (1) variable X» may be helpful in the regression model 
even when X, is already in the model, and (2) a linear term in X» appears to be adequate 
because no curvilinear relation is suggested by the scatter of points. Thus, the added- 
variable plot for X» in Figure 10.4d complements the regular residual plot in Figure 104c 
by indicating the potential usefulness of thigh circumference (Хэ) in the regression model 
when triceps skinfold thickness (X) is already in the model. This information is consistent 
with the г“ statistic for b» of 2.26 in Table 7.2c and the moderate coefficient of partial 
determination of Күзү = .232. Finally, the added-variable plot in Figure 10.44 reveals the 
presence of one potentially influential case (case 3) in the lower left corner. The influence 
of this case will be investigated in greater detail in Section 10.4. 


FIGURE 10.4 
Residual Plots 
and Added- 
Variable 

plots —Body 
Fat Еха mple 
with Two 
predictor 
Variables. 


Chapter 10 Building the Regression Model II: Diagnostics 389 


(a) Residual Plot against X, (b) Added-Variable Plot for X, 


Residual 
e(Y|X;) 


=5 -4-3-2-1 01 2 34 5 


,eQa]X2) Ё 


(d) Added-Variable Plot for X; 


Residual 
e(Y|X:) 


40 45 50 55 60 —5 -4 -3 -2-1 0123 4 5 
X2 e(X2|X:) 


Comments 


1. An added-variable plot only suggests the nature of the functional relation in which a predictor 
variable should be added to the regression model but does not provide an analytic expression of the 
relation. Furthermore, the relation shown is for X, adjusted for the other predictor variables in the 
regression model, not for X, directly. Hence, a variety of transformations or curvature effect terms 
may need to be investigated and additional residual plots utilized to identify the best transformation 
or curvature effect terms. 

2. Added-variable plots need to be used with caution for identifying the nature of the marginal 
effect of a predictor variable. These plots may not show the proper form of the marginal effect of a 
predictor variable if the functional relations for some or all of the predictor variables already in the 
regression model are misspecified. For example, if X; and X; are related in a curvilinear fashion to 
the response variable but the regression model uses linear terms only, the added-variable plots for X2 


390 Раг Two Multiple Linear Regression 


and Хз may not show the proper relationships to the response variable, especially when the Predictor 
variables are correlated. Since added-variable plots for the several predictor variables are all concerned 
with marginal effects only, they may therefore not be effective when the relations of the predictor уап. 
ables to the response variable are complex. Also, added-variable plots may not detect interaction effects 
that are present. Finally, high multicollinearity among the predictor variables may cause the added. 
variable plots to show an improper functional relation for the marginal effect of a predictor Variable, 

3. When several added-variable plots are required for a set of predictor variables, it is not пес. 
essary to fit entirely new regression models each time. Computational procedures аге available that 
economize on the calculations required; these are explained in specialized texts such as Reference 10.1 

4. Any fitted multiple regression function can be obtained from a sequence of fitted partial Testes: 
sions. To illustrate this, consider again the life insurance example, where the fitted regression of Yo 
X» is given in (10.4) and the fitted regression of X, on X» 15 given in (10.4b). 4f we now regregg the 
residuals e(Y|X») = Y — Y (X2) on the residuals e(X,|X2) = X, — X ı (X2), using regression through 
the origin, we obtain (calculations not shown): 


e(Y|X2) = 6.2880[e(X,| X2)] (10.5) 
By simple substitution, using (10.4a) and (10.4b). we obtain: 
LÊ — (50.70 + 15.54Х»)| = 6.2880| X, — (40.779 + 1.718X3)] 


or: 


^ 


= —205.72 + 6.2880 X, + 4.737 X» (10.6 


where the solution for Y is the fitted value Ӯ when X, and X» are included in the regression model, 
Note that the fitted regression function in (10.6) is the same as when the regression model was fitted 
to X, and X» directly in (10.3), except for a minor difference due to rounding effects. 

5. A residual plot closely related to the added-variable plot is the partial residual plot. This plot 
also is used as an aid for identifying the nature of the relationship for a predictor variable X, under 
consideration for addition to the regression model. The partial residual plot takes as the starting рош 
the usual residuals e; = Y, — Р; when the mode! including X, is fitted, to which the regression effect 
for X, 1s added. Specifically, the partial residuals for examining the effect of predictor variable Ху, 
denoted by p; (X,). are defined as follows: 


pi( Xx) = e + Хк (10.7) 


"Thus. for a partial residual, we add the effect of X}, as reflected by the fitted model term b, Xip, back 
onto the residual. A plot of these partial residuals against X, is referred to as a partial residual plot 
The reader is referred to References 10.2 and 10.3 for more details on partial residual plots. i 


10.2 Identifying Outlying Y Observations—Studentized 
Deleted Residuals 


Outlying Cases 
Frequently in regression analysis applications, the data set contains some cases that are out 
lying or extreme; that is, the observations for these cases are well separated from the 
remainder of the data. These outlying cases may involve large residuals and often have 
dramatic effects on the fitted least squares regression function. It is therefore important t0 


FIG URE 1 0.5 
6 catter Plot for 
Regression 
with One 
predictor 
Variable 
JJustrating 
Outlying 

Cases. 


Chapter 10 Building the Regression Model II: Diagnostics 391 


3 


study the outlying cases carefully and decide whether they should be retained or elinfinated, 
and if retained, whether their influence should be reduced in the fitting process and/or the 
regression model should be revised, 

А case may be outlying or extreme with respect to its Y value, its X value(s), or both. 
Figure 10.5 illustrates this for the case of regression with a single predictor variable. In the 
scatter plot in Figure 10.5, case 1 is outlying with respect to its Y value, given X. Note that 
this point falls far outside the scatter, although its X value is near the middle of the range of 
observations on the predictor variable. Cases 2, 3, and 4 are outlying with respect to their 
X values since they have much larger X values than those for the other cases; cases 3 and 
4 are also outlying with respect to their Y values, given X. 

Not all outlying cases have a strong influence on the fitted regression function. Case 1 
in Figure 10.5 may not be too influential because a number of other cases have similar 
X values that will keep the fitted regression function from being displaced too far by the 
outlying case. Likewise, case 2 may not be too influential because its Y value is consistent 
with the regression relation displayed by the nonextreme cases. Cases 3 and 4, onthe other 
hand, are likely to be very influential in affecting the fit of the regression function. They 
are outlying with regard to their X values, and their Y values are not consistent with the 
regression relation for the other cases. 

A basic step in any regression analysis is to determine if the regression model under 
consideration is heavily influenced by one or a few cases in the data set. For regression 
with one or two predictor variables, it is relatively simple to identify outlying cases with 
respect to their X or Y values by means of box plots, stem-and-leaf plots, scatter plots, and 
residual plots, and to study whether they are influential in affecting the fitted regression 
function. When more than two predictor variables are included in the regression model, 
however, the identification of outlying cases by simple graphic means becomes difficult 
because single-variable or two-variable examinations do not necessarily help find outliers 
relative to a multivariable regression model. Some univariate outliers may not be extreme 
in a multiple regression model, and, conversely, some multivariable outliers may not be 
detectable in single-variable or two-vartable analyses. = 

We now discuss the use of some refined measures for identifying cases with outlying 
Y observations. Їп the following section we take up the identification of cases that are 
multivariable outliers with respect to their X values. 


392 PartTwo Multiple Linear Regression 


Residuals and Semistudentized Residuals 


Hat Matrix 


The detection of outlying or extreme Y observations based on an examination of the residua] 
has been considered in earlier chapters. We utilized there either the residual e;: 


сЕ (10.8) 
or the semistudentized residuals ех: 
* ё; 
ej = JMSÉ (10.9 


We introduce now two refinements to make the analysis of residuals more effective fo, 
identifying outlying Y observations, These refinements require the use of the hat matrix, 
which we encountered in Chapters 5 and 6. 


The hat matrix was defined in (6.30a): 


H = X(X’X)'X’ (10.10) 


ахи 


We noted in (6.30) that the fitted values Ӯ; can be expressed as linear combinations of the 
observations Y; through the hat matrix: 


^ 


Y-HY (10.11) 


and similarly we noted in (6.31) that the residuals e; can also be expressed as linear com- 
binations of the observations Y, by means of the hat matrix: 


=@-н)ү (10.12) 


Further, we noted in (6.32) that the variance-covariance matrix of the residuals involves 
the hat matrix: 


cle] = c?(1 — H) (10.13) 
Therefore, the variance of residual e;, denoted by o? {e;}, is: 
ое} = о?(1 — hi) (10.14) 


where Àj; is the ith element on the main diagonal of the hat matrix, and the covariance 
between residuals e; and e; (i Æ Ј) is: 


v[(e,ej —-o*(0—hij)—-hjo? i Fj (10.15) 


where /i;; is the element in the ith row and jth column of the hat matrix. 
These variances and covariances are estimated by using MSE as the estimator of the error 
E 2: 

variance o^: 


s?(e;] = MSE(1— hi) (10.162) 
s{e;,,e;}= -hj(MSE) ij (10.16b) 


We shall illustrate these different roles of the hat matrix by an example. 


{ABLE 10.2 
Tilustration of 
gat Matrix. 


ME 
Example 
Exampe — 


Chapter 10 Building the Regression Model II: Diagnostics 393 


(a) Data and Basic Results 


(1) (2) (3) (4) (5) (6) 7) 
i Xn Xiz Yı f, е hy s?(e) 
1 14 25 301 282.2 18.8 .3877 352.0 
2 19 32 327 3323 —5.3 9513 28.0 
3 12 22 246 260.0 —14.0 .6614 194.6 
4 11 15 187 186.5 5 9996 2 

(b) H (с) s*(e) 

.3877 1727 4553 —.0157 352.0 —99.3 261 .8 9.0 

Л727 9513 —.1284 .0044. —99.3 28.0 33.8 —2.5 

| .4553 —.1284 6614 .0117 1—261.8 73.8 194.6 —6.7 

31 —.0157 .0044 .0117 .9996 9.0 —2.5 —6.7 21h 


A small data set based on n=4 cases for examining the regression relation between 
a response variable Y and two predictor variables X, and X2 is shown in Table 10.2a, 
columns 1—3. The fitted first-order model and the error mean square are: 


Y = 80.93 — 5.84X, + 11.32X; 


(10.17) 
MSE — 574.9 


The fitted values and the residuals for the four cases are shown in columns 4 and 5 of 
Table 10.2a. | 

The hat matrix for these datais shown in Table 10.2b. It was obtained by means of (10.10) 
for the X matrix: 


1 14 25 
1 19 32 
1 12 22 
1 11 15 


X= 


Note from (10.10) that the hat matrix is solely a function of the predictor variable(s). Also 
note from Table 10.2b that the hat matrix is symmetric. The diagonal elements h;; of the 
hat matrix are repeated in column 6 of Table 10.2a. 

We illustrate that the fitted values are linear combinations of the Y values by calculating 
Ê, by means of (10.11): 


Y, = ha Yı + А20 + PasYs + Ааа 
= 3877(301) + .1727(327) + .4553(246) — .0157(187) 
= 282.2 


This is the same result, except for possible rounding effects, as obtained from the fitted 
regression function (10.17): 


Y, = 80.93 — 5.84(14) + 11.32(25) = 282.2 


94 PartTwo Multiple Linear Regression 


The estimated variance-covariance matrix of the residuals, s^ [e] = MSEC — H),is shown 
in Table 10.2c. It was obtained by using MSE = 574.9. The estimated variances of the 
residuals are shown in the main diagonal of the variance-covariance matrix in Table 10.2¢ 
and are repeated in column 7 of Table 10.2a. We illustrate their direct calculation for case | 
by using (10.162): 


54е} = 574.9(1 — .3877) = 352.0 


We see from Table 10.2a, column 7, that the residuals do not have constant variance. 
In fact, the variances differ greatly here because the data set is so small. As we shal] note 
in Section 10.3, residuals for cases that are outlying with respect to the X variables һауе 
smaller variances. 

Note also that the covariances in the matrix in Table 10.2c are not zero; hence, pairs of 
residuals are correlated, some positively and some negatively. We noted this correlation in 
Chapter 3, but also pointed out there that the correlations become very small for larger data 
sets. 


Comment 


The diagonal element А; of the hat matrix can be obtained directly from: 


hi; = X} (XX) |X; (10.18) 
where: 
| 
Xia 
х= |. (10.182) 
pxt : 
Xi.p-1 


Note that X; corresponds to the X, vector in (6.53) except that X; pertains to the ith case, and that X; 
is simply the ith row of the X matrix, pertaining to the ith case. L| 


Studentized Residuals 


The first refinement in making residuals more effective for detecting outlying Y observa- 
tions involves recognition of the fact that the residuals e; may have substantially different 
variances o7{e;}. It is therefore appropriate to consider the magnitude of each e; relative to 
its estimated standard deviation to give recognition to differences in the sampling errors of 
the residuals. We see from (10.162) that an estimator of the standard deviation of е; is: 


slei} = V MSE(1 — hii) (10.19) 


The ratio of e; to s{e;} is called the studentized residual and will be denoted Бу гү: 


е 
pe ed 10.20 
| "= (10.20) 


While the residuals e; will have substantially different sampling variations if their standard 
deviations differ markedly, the studentized residuals r; have constant variance (when the 
model is appropriate). Studentized residuals often are called internally studentized residuals. 


Chapter 10 Building the Regression Model II: Diagnostics 395 


peleted Residuals 
The second refinement to make residuals more effective for detecting outlying Y observa- 
tions is to measure the ith residual e; = Y; — Ў; when the fitted regression Is based on all 
of the cases except the ith one. The reason for this refinement is that if Y; 15 far outlying, 
the fitted least squares regression function based on all cases including the ith one may be 
influenced to come close to Y;, yielding a fitted value Y; near Y;. In that event, the residual 
€; will be small and will not disclose that Y, is outlying. On the other hand, if the ith case 
is excluded before the regression function is fitted, the least squares fitted value Ӯ; is not 
influenced by the outlying Y; observation, and the residual for the ith case will then tend to 
be larger and therefore more likely to disclose the outlying Y observation. 
The procedure then is to delete the ith case, fit the regression furiction to the remaining 
n — 1 cases, and obtain the point estimate of the expected value wheri'the X levels are those 
of the ith case, to be denoted by ';(. The difference between the actual observed value Y; 
and the estimated expected value Via will be denoted by d;: і 
di = Y; – Yi (10.21) 
. Тһе difference d; is called the deleted residual for ће ith case. We encountered this same 
difference in (9.16), where it was called the PRESS prediction error for the ith case. 
An algebraically equivalent expression for d; that does not require a recomputation of 
the fitted regression function omitting the ith case is: 


€i 
1—hj 


where е; is the ordinary residual for the ith case and h;; is the ith diagonal element in the 
hat matrix, as given in (10.18). Note that the larger is the value h;;, the larger will be the 
deleted residual as compared to the ordinary residual. 

Thus, deleted residuals will at times identify outlying Y observations when ordinary 
residuals would not identify these; at other times deleted residuals lead to the same identi- 
fications as ordinary residuals. 

Note that a deleted residual also corresponds to the prediction error for a new observation 
in the numerator of (2.35). There, we are predicting a new n + 1 observation from the fitted 
regression function based on the earlier n cases. Modifying the earlier notation for the 
context of deleted residuals, where n — 1 cases are used for predicting the "new" nth case, 
we can restate the result in (6.63a) to obtain the estimated variance of d;: 


d; — 


(10.21a) 


s?{d;} = MSEg (1 +X; (XoXo) 'Х;) (10.22) 


where X; is the X observations vector ¢10.18a) for the ith case, MSE) is the mean square 
error when the ith case is omitted in fitting the regression function, and X( is the X matrix 
with the ith case deleted. An algebraically equivalent expression for s{d;} is: 


E, 
‚а ы. d (10.22а) 
1 — hi 
It follows from (6.63) that: 
di 
25 1 10.23 
sidi} ae S ) 


396 PartTwo Multiple Liuear Regression 


Remember that n — 1 cases are used here in predicting the ¿ith observation; hence the 
degrees of freedom are (п — 1) — p —n-— р 1. 


Studentized Deleted Residuals 


Example . 


Combining the above two refinements, we utilize for diagnosis of outlying or extreme 
Y observations the deleted residual d; in (10.21) and studentize it by dividing it by its 
estimated standard deviation given by (10.22). The studentized deleted residual, denoted 
by t;, therefore is: 
d; 
li = sidi А (10.24) 


It follows from (10.21a) and (10.222) that an algebraically equivalent expression for t, is: 
— ei 
V MSEqy(1 — his) 


The studentized deleted residual г; in (10.24) is also called an externally studentizeg 
residual, in contrast to the internally studentized residual r; in (10.20). We know from (10.23) 
that each studentized deleted residual г; follows the t distribution with n — p — 1 degrees 
of freedom. The t, however, are not independent. 

Fortunately, the studentized deleted residuals г, in (10.24) can be calculated without 
having to fit new regression functions each time a different case is omitted. A simple 
relationship exists between MSE and MSE;: 


ti 


(10.24a) 


e? 
(n — pMSE = (n — p — 1)MSE;, + Yos (10.25) 
— hij 
Using this relationship in (10.242) yields the following equivalent expression for r;: 
n 
n—p-l 
i = 6i 1 .2 
s sss =й = zl oe 


Thus, the studentized deleted residuals t; can be calculated from the residuals e;, the error 
sum of squares SSE, and the hat matrix values Л, all for the fitted regression based on the 
n cases. 


Test for Outliers. We identify as outlying Y observations those cases whose studentized 
deleted residuals are large in absolute value. In addition, we can conduct a formal test 
by means of the Bonferroni test procedure of whether the case with the largest absolute 
studentized deleted residual is an outlier. Since we do not know in advance which case will 
have the largest absolute value |j], we consider the family of tests to include n tests, one 
for each case. If the regression model is appropriate, so that no case is outlying because of 
a change in the model, then each studentized deleted residual will follow the г distribution 
with n — p — | degrees of freedom. The appropriate Bonferroni critical value therefore is 
t(1 — o /2n; n — p — 1). Note that the test is two-sided since we are not concerned with the 
direction of the residuals but only with their absolute values. 


For the body fat example with two predictor variables (X1, X2), we wish to examine 
whether there are outlying Y observations. Table 10.3 presents the residuals e; in column 1, 


TABLE 10.3 
Residuals, 


ordin of the 
Bat Matrix, 
and 
Studentized 
peleted 
Residuals— 
Body Fat 
Example with 
Two Predictor 
Variables. 


Chapter 10 Building the Regression Model II: Diagnostics 397 
(1) (2) (3) 
i е һи ti 
1 —1.683. .201 —.730 
2 3.643 .059 1.534 
3 —3.176 .372 —1.656 
4. —3.158 111 —1.348 
5 000 .248 .000 
6 —.361 129 —.148 
7 716 156 298 
8 4:015 .096 1.760 р 
9 2.655 115, 1.117 E 
10- —2475 .110 —1.034 Р 
11 _ 336 .120 137 
12 2:226 .109 .923 і 
13 —3.947 178 —1.825 
14 3.447 .148 1.524 
15 571 .333 .267 
16 .642 .095 .258 
17 —.851 106 344 
18 —.783 197 .335 
19 —2.857 .067 —1.176 
20 1.040 .050 .409 


the diagonal elements h;; of the hat matrix in column 2, and the studentized deleted residuals 
f; in column 3. We illustrate the calculation of the studentized deleted residual for the first 
case. The X values for this case, given in Table 7.1, are Х| = 19.5 and Ху = 1, 1. Using 
the fitted regression function from Table 7.2c, we obtain: 


Ê, = —19.174 + .2224(19.5) + .6594(43.1) = 13.583 
Since Y, — 11.9, the residual for this case is e, — 11.9 — 13.583 — — 1.683. We also know 


from Table 7.2c that SSE = 109.95 and from Table 10.3 that hy, = .201. Hence, by (10.26), 
we find: 
20 — 3 —1 eb 
t, = —1.683 = —.730 
{ Fe — 201) — (—1.683)2 


Note from Table 10.3, column 3, that cases 3, 8, and 13 have the largest absolute studen- 
tized deleted residuals. Incidentally, consideration of the residuals e; (shown in Table 10.3, 
column 1) here would have identified cases 2, 8, and 13 as the most outlying ones, but not 
case 3. . 

We would like to test whether case 13, which has the largest absolute studentized 
deleted residual, is an outlier resulting from a change in the model. We shall use the 
Bonferroni simultaneous test procedure with a family significance level of œ = .10. We 
therefore require: 


t(1—o/2n;n — p — 1) = 1(.9975; 16) = 3.252 


398 Part Two Multiple Linear Regression 


Since |ti3| = 1.825 x 3.252, we conclude that case 13 is not an outlier. Still, we might 
wish to investigate whether case 13 and perhaps a few other outlying cases are influeptia 
in determining the fitted regression function because the Bonferroni procedure Provides а 
very conservative test for the presence of an outlier. 


10.3 Identifying Outlying X Observations—llat Matrix 


everage values 
Leverage Val 


Use of Hat Matrix for Identifying Outlying X Observations 


FIGURE 10.6 
Hlustration of 
Leverage 
Values as 
Distance 
Measures— 
Table 10.2 
Example. 


The hat matrix, as we saw, plays an important role in determining the magnitude of a 
studentized deleted residual and therefore in identifying outlying Y observations. The hat 
matrix also is helpful in directly identifying outlying X observations. In particular, the 
diagonal elements of the hat matrix are a useful indicator in a multivariable Setting of 
whether or not a case is outlying with respect to its X values. 

The diagonal elements h;; of the hat matrix have some useful properties. In particular, 
their values are always between 0 and 1 and their sum is p: 


O<h; sli У л=р (10.27) 


where p is the number of regression parameters in the regression function including the 
intercept term. In addition, it can be shown that h;; is a measure of the distance between 
the X values for the г case and the means of the X values for all n cases. Thus, a large value 
hj; indicates that the ith case is distant from the center of all X observations. The diagonal 
element В; in this context is called the /everage (in terms of the X values) of the ith case, 
Figure 10.6 illustrates the role of the leverage values Л; as distance measures for our 
earlier example in Table 10.2. Figure 10.6 shows a scatter plot of X» against X, for the 
four cases, and the center of the four cases located at (X1, X2). This center is called 
the centroid. Here, the centroid is (X, = 14.0, X» = 23.5). In addition, Figure 10.6 shows 
the leverage value for each case. Note that cases 1 and 3, which are closest to the centroid, 
have the smallest leverage values, while cases 2 and 4, which are farthest from the center, 
have the largest leverage values. Note also that the four leverage values sum to p = 3. 


X2 
35 
һу) = .9513 —e 


25 hy = .3877 —9 mE 
ө O<— (Xj, X3) 


h33 co .661 4 


hag = 9996 


M^ | —L i 
0 10 15 20 X 


Example 


FIGURE 10.7 
Scatter Plot 

of Thigh 
Circumference 
against Triceps 
Skinfold 
Thickness— 
Body Fat 
Example with 
‘Two Predictor 
Variables, 


Chapter 10 Building the Regression Model П: Diagnostics 399 


If the ith case is outlying in terms of its X observations and therefore has a large leverage 
value й;;, it exercises substantial leverage in determining the fitted value Y;. This is so for 
the following reasons: 


1. The fitted value Ў, is a linear combination of the observed Y values, as shown 
by (10.11), and h;; is the weight of observation Y; in determining this fitted value. Thus, the 
larger is h;;, the more important is Y; in determining Ў,. Remember that h;; is a function 
only of the X values, so h;; measures the role of the X values in determining how important 
Y; is in affecting the fitted value Ў;. 

2. The larger is h;;, the smaller is the variance of the residual e;, as we noted earlier 
from (10.14). Hence, the larger is h;;, the closer the fitted value Pfwil tend to be to the 
observed value Y;. In the extreme case where h,;; = 1, the variance öle} equals 0, so the 
fitted value f is then forced to equal the observed value Y;. | 


A leverage value h;; is usually considered to be large if it is more than twice шас а$ 
the mean leverage value, denoted by h, which according to (10.27) is: 
һ= 26s hii = р 
п п 
Hence, leverage values greater than 2p/n аге considered by this rule to índicate outlying 
cases with regard to their X values. Another suggested guideline is that h;; values exceeding 
.5 indicate very high leverage, whereas those between .2 and .5 indicate moderate leverage. 
Additionalevidence of an outlying case is the existence of a gap between the leverage values 
for most of the cases and the unusually large leverage value(s). 
The rules just mentioned for identifying cases that are outlying with respect to their 
X values are intended for data sets that are reasonably large, relative to the number of 
parameters in the regression function. They are not applicable, for instance, to the simple 
example їп Table 10.2 where there are n — 4 cases and p — 3 parameters in the regression 
function. Here, the mean leverage value ís 3/4 — .75, and one cannot obtain a leverage 
value twice as large as the mean value since leverage values cannot exceed 1.0. 


(10.28) 


We continue with the body fat example of Table 7.1. We again use only the two predictor 
variables-triceps skinfold thickness (X,) and thigh circumference (Хз) so that the results 
using the hat matrix can be compared to simple graphic plots. Figure 10.7 contains a scatter 


Thigh Circumference 


0 14 16 18 20 22 24 26 28 30 


Triceps Skinfold Thickness 


O0 PartTwo Multiple Linear Regression 


plot of Хэ against X,, where the data points are identified by their case number. We note from 
Figure 10.7 that cases 15 and 3 appear to be outlying ones with respect to the pattern of the 
X values. Case 15 is outlying for X, and at the low end of the range for X», whereas Case 3 
is outlying in terms of the pattern of multicollincarity, though it is not outlying for either of 
the predictor variables separately. Cases | and 5 also appear to be somewhat extreme. 

Table 10.3, column 2, contains the leverage values Юу; for the body fat example. Note that 
the two largest leverage values are h3,; = .372 and А55 = .333. Both exceed the Criterion 
of twice the mean leverage value, 2p/n = 2(3)/20 = .30, and both are separated by a 
substantial gap from the next largest leverage values, iss = .248 and hy, = .201. Having 
identified cases 3 and 15 as outlying in terms of their X values, we shall need to ascertain 
how influential these cases are in the fitting of the regression function. 


Use of Hat Matrix to Identify Hidden Extrapolation 

We have seen that the hat matrix is useful in the model-building stage for identifying cages 
that are outlying with respect to their X values апа that, therefore, may be influential jn 
affecting the fitted model. The hat matrix is also useful after the model has been selected 
and fitted for determining whether an inference for a mean response or a new observation 
involves a substantial extrapolation beyond the range of the data. When there are only two 
predictor variables, it is easy to see from a scatter plot of X? against X, whether an inference 
for a particular (X1, Хз) set of values is outlying beyond the range of the data, such as from 
Figure 10.7. This simple graphic analysis is no longer available with larger numbers of 
predictor variables, where extrapolations may be hidden. 

'To spot hidden extrapolations, we can utilize the direct leverage calculation in (10.18) 
for the new set of X values for which inferences are to be made: 


yc. new = XQ (CX)! Xnew (1 0.29) 


where Хо is the vector containing the X values for which an inference about a mean 
response or a new observation is to be made, and the X matrix is the one based on the data 
set used for fitting the regression model. If Anew.new is well within the range of leverage 
values h;; for the cases in the data set, no extrapolation is involved. On the other hand, if 
Алсу пем is much larger than the leverage values for the cases in the data set, an extrapolation 
is indicated. 


10.4 Identifying Influential Cases—DFFTTS, Cook's Distance, 
and DFBETAS Measures 


After identifying cases that are outlying with respect to their Y values and/or their X 
values, the next step is to ascertain whether or not these outlying cases are influential. We 
shall consider a case to be influential if its exclusion causes major changes in the fitted 
regression function. As noted in Figure 10.5, not all outlying cases need be influential. For 
example, case | in Figure 10.5 may not affect the fitted regression function to any substantial 
extent. 

We take up three measures of influence that are widely used in practice, each based on 
the omission of a single case to measure its influence. 


Chapter 10 Building the Regression Model П: Diagnostics 401 


Influence on Single Fitted Value—DFFITS 


Éxample 


* 


A useful measure of the influence that case i has on the fitted value P; is given by: 
f; — Yun 
J/ MSEqyhi; 


The letters DF stand for the difference between the fitted value fi for the ith case when all 
cases are used in fitting the regression function and the predicted value Pu for the ith case 
obtained when the ith case is omitted in fitting the regression function. The denominator 
of (10.30) is the estimated standard deviation of Ў, , but it uses the error mean square ween 
the ith case is omitted in fitting the regression function for estimating the error variance о? 
The denominator provides a standardization so that the value (DFFITS X for the ith case 
represents the number of estimated standard deviations of Ў, that the fitted value Ӯ; increases 
or decreases with the inclusion of the ith case in fitting the regression model. 

It can be shown that the DFFITS values can be computed by using only the results from 
fitting the entire data set, as follows: 


n—-p-l He hii s hii E 
F = e; | —_——_————— ———— = f; | —— E 
(DFFITS); = е; mn Ld z (; = - ti pam (10.302) 


(DFFITS); = (10.30) 


Note from the last expression that the DFFITS value for the ith case is a studentized deleted 
residual, as given in (10.26), increased or decreased by a factor that is a function of the 
leverage value for this case. If case i is an X outlier and has a high leverage value, this 
factor will be greater than 1 and (DFFITS); will tend to be large absolutely. 

As a guideline for identifying influential cases, we suggest considering a case influential 
if the absolute value of DFFITS exceeds 1 for small to medium data sets and 24/p/n for 
large data sets. 


Table 10.4, column 1, lists the DFFITS values for the body fat example with two predictor 
variables. To illustrate the calculations, consider the DFFITS value for case 3, which was 
identified as outlying with respect to its X values. From Table 10.3, we know that the 
studentized deleted residua] for this case is 4 = —1.656and the leverage value is h33 = .372. 
Hence, using (10.30a) we obtain: 


372 \ 
DFFITS)3 = —1.656| — = —1.27 
( s f = A 
The only DFFITS value in Table 10.4 that exceeds our guideline for a medium-size 
data set is for case 3, where |(DFFITS )3| = 1.273. This value is somewhat larger than our 
guideline of 1. However, the value is close enough to 1 that the case may not be influential 
enough to require remedial action. 


Comment + = 


The estimated variance of Ў; used in the denominator of (10.30) is developed from the relation 
Y = HY in (10.11). Using (5.46), we obtain: 


c^(Y) = Ho^(Y]H' = H(o?DH' 


402 PartTwo Multiple Linear Regression 


TABLE 10.4 
DFFITS, 
Cook's 
Distances, and 
DFBETAS— 
Body Fat 
Example with 
Two Predictor 
Variables. 


(1) (2) (3) (4) (5) 
______РЕВЕТА5__ 
і (DFFITS); D; bo bi b; 
1 —.366 .046 —.305 —.132 .232 
2 .384 .046 Л73 . 115 —.143. 
3 —1.273 .490 —.847 —1.183 1.067 
4 —.476 072 —.102 —.294 ‚196: 
5 000 .000 .000 .000 .000 
6 —.057 .001 .040 .040 —.044 
7 .128 .006 —.078 ~.016 054. 
8 575 098 .261 391 —:333 
9 .402 .053 —.151 —.295 247 
10 —.364 044 .238 245 —.269. 
11 .051 .001 —.009 .017 —,003 
12 .323 .035 —131 -x .023 .070 
13 —.851 .212 119 ' | .592 —.390 
14 .636 Л25 .452 113 —.298 
15 189 013 —.003 —.125 .069 
16 .084 .002 .009 .043 —;025 
17 —.118 .005 .080 .055 —,076 
18 —.166 .010 132 .075 —.116 
19 —.315 .032 —.130 —.004 064 


20 ‚094 003 010 -002 —.003 


Since H is a symmetric matrix, so H' = Н, and it is also idempotent, so HH — H, we obtain: 
o^(Y) = c?H (10.31) 
Hence, the variance of Ў, is: 
c?^(f,) = o?h; (10.32) 


where h;; is the ith diagonal element of the hat matrix. The error term variance c? is estimated 
in (10.30) by the error mean square MSE,; obtained when the ith case is omitted in fitting the regression 
model. 


Influence on All Fitted Values—Cook's Distance 


In contrast to the DFFITS measure in (10.30), which considers the influence of the ith case 
on the fitted value Ӯ ; for this case, Cook's distance measure considers the influence of 
the ith case on all n fitted values. Cook's distance measure, denoted by D,, is an aggregate 
influence measure, showing the effect of the ith case on all n fitted values: 


PRAET (Ê; - Po) (10.33) 
pMSE 


Note that the numerator involves similar differences as in the DFFITS measure, but here 
each of the n fitted values f; is compared with the corresponding fitted value Y ; ja When the 
ith case is deleted in fin the regression model. These differences are then ‘squared and 
summed, so that the aggregate influence of the ith case is measured without regard to the 
signs of the effects. Finally, the denominator serves as a standardizing measure. In matrix 


Di = 


Example 


Chapter 10 Building the Regression Model II: Diagnostics 403 


terms, Cook's distance measure can be expressed as follows: 


D; = ($ Ж Yo) (Y Ба Yo) 
4 pMSE 


(10.33a) 


Here, Y as usual is the vector of the fitted values when all n cases are used for the regression 
fit and Yo is the vector of the fitted values when the ith case is deleted. 

For interpreting Cook’s distance measure, it has been found useful to relate D; to the 
F(p, n — p) distribution and ascertain the corresponding percentile value. If the percentile 
value is less than about 10 or 20 percent, the ith case has little apparent influence on the fitted 
values. If, on the other hand, the percentile value is near 50 percent or more, the fitted values 
obtained with and without the ith case should be considered to differ substantially, implying 
that the ith case has a major influence on the fit of the regression function. 

Fortunately, Cook’s distance measure D; can be calculated without fitting a new re- 
gression function each time a different case is deleted. An algebraically equivalent expres- 
sion is: 


pice hu 10.33b 
=; cse la Al (Hoan) 
Note from (10.33b) that D; depends on two factors: (1) the size of the residual e; and (2) 
the leverage value h;;. The larger either е; or hy is, the larger D; is. Thus, the ith case can 
be influential: (1) by having a large residual e; and only a moderate leverage value h;;, or 
(2) by having a large leverage value h;; with only a moderately sized residual e;, or (3) by 
having both a large residual e; and a large leverage value hj;. 


For the body fat example with two predictor variables, Table 10.4, column 2, presents the 
Р; values. To illustrate the calculations, we consider again case 3, which is outlying with 
regard to its X values. We know from Table 10.3 that ез = —3.176 and h33 = .372. Further, 
MSE = 6.47 according to Table 7.2c and p = 3 for the model with two predictor variables. 
Hence, we obtain: 


_ (3.176)? | .372 


3 — 3(647) а язу] = 480 


We note from Table 10.4, column 2 that case 3 clearly has the largest D; value, with the 
next largest distance measure Dj; = -212 being substantially smaller. Figure 10.8 presents 
the information provided by Cook's distance measure about the influence of each case in 
two different plots. Shown in Figure 10.8a is a proportional influence plot of the residuals e; 
against the corresponding fitted values Y';, the size of the plotted points being proportional 
to Cook's distance measure D,. Figure 10.8b presents the information about the Cook's 
distance measures in the form of an index influence plot, where Cook's distance measure 
D, is plotted against the corresponding case index i. Both plots in Figure 10.8 clearly show 
that one case stands out as most influential (case 3) and that all the other cases are much less 
influential. The proportional influence plot in Figure 10.8a shows that the residual for the 
most influential case is large negative, but does not identify the case. The index influence 
plot in Figure 10.8b, on the other hand, identifies the most influential case as case 3 but 
does not provide any information about the magnitude of the residual for this case. 


404 PartTwo Multiple Liuear Regression 


FIGURE 10.8 Proportional Influence Plot (Points Porportional in Size to Cook's Distance Measure) and Indey 
Influence Plot—Body Fat Example with Two Predictor Variables. 


Residual 


4.5 


(а) Proportional Influence Plot (b) Index Influence Plot 
0.5 
e 
m " 
. 0.4 
К a 
© 
5 Б 0.3 
E 
м 0.2 
© 
е о 
О 
Я e 0.1 
9 je 2i zz 0.0 J 
15 20 25 30 0 5 10 15 20 25 
YHAT Case Index Number 


To assess the magnitude of the influence of case 3 (D3 = .490), we refer to the corre- 
sponding F distribution, namely, F(p, n — p) = F(3, 17). We find that .490 is the 30.6th 
percentile of this distribution. Hence, it appears that case 3 does influence the regression fit, 
but the extent of the influence may not be large enough to call for consideration of remedial 
measures. 


Influence on the Regression Coefficients—DFBETAS 


A measure of the influence of the ith case on each regression coefficient b, (К = 0, 1, ..., 
p — 1) is the difference between the estimated regression coefficient b, based on all n cases 
and the regression coefficient obtained when the ith case is omitted, to be denoted by by. 
When this difference is divided by an estimate of the standard deviation of Бу, we obtain 
the measure DFBETAS: 


Б, — bki 
(РЕВЕТАЅ) ц = ———9- k=0,1,...,p— 1 (10.34 


A/ MSE(,Ckk 
where cx, is the kth diagonal element of (X’X)~'. Recall from (6.46) that the variance- 
covariance matrix of the regression coefficients is given by o?(b] = o?(X'X) ^! . Hence the 
variance of b, is: 


c? {by} = o? Chk (10.35) 


The error term variance с? here is estimated by МХЕ), the error mean square obtained 
when the ith case is deleted in fitting the regression model. 

The DFBETAS value by its sign indicates whether inclusion of a case leads to an increase 
or a decrease in the estimated regression coefficient, and its absolute magnitude shows 
the size of the difference relative to the estimated standard deviation of the regression 
coefficient. A large absolute value of (DFBETAS),;, is indicative of a large impact of the 


Eample _ 


Chapter 10 Building the Regression Model П: Diagnostics 405 


ith case on the kth regression coefficient. As a guideline for identifying influential cases, 
we recommend considering a case influential if the absolute value of DFBETAS exceeds 1 
for small to medium data sets and 2/./n for large data sets. 


For the body fat example with two predictor variables, Table 10.4 lists the DFBETAS values 
in columns 3, 4, and 5. Note that case 3, which is outlying with respect to its X values, 
is the only case that exceeds our guideline of 1 for medium-size data sets for both b, and 
b2. Thus, case 3 is again tagged as potentially influential. Again, however, the DFBETAS 
values do not exceed 1 by very much so that case 3 may not be so influential as to require 
remedial action. 5 


Comment 


Cook's distance measure of the aggregate influence of a case on the n fitted values, which was defined 
in (10.33), is algebraically equivalent to a measure of the aggregate influence of a case on the p 
regression coefficients. In fact, Cook's distance measure was originally derived from the concept of 
a confidence region for all p regression coefficients В, (k = 0, 1,..., p — 1) simultaneously. It can 
be shown that the boundary of this joint confidence region for the normal error multiple regression 
model (6.19) is given by: 


(b — B/X'X(5 — В) 


MSE = F(i—a; р,п – р) (10.36) 


Cook's distance measure D; uses the same structure for measuring the combined impact of the ith 
case on the differences in the estimated regression coefficients: 


p, _ (P - bo) XX(b — bo) 


10. 
Р pMSE (10:37) 


where bo) is the vector of the estimated regression coefficients obtained when the ith case is omitted 
and b, as usual, is the vector when all п cases are used. The expressions for Cook's distance measure 
in (10.33a) and (10.37) are algebraically identical. и 


Influence on Inferences 


——————— 
Example 
—————— 


To round out the determination of influential cases, it is usually a good idea to examine in a 
direct fashion the inferences from the fitted regression model that would be made with and 
without the case(s) of concern. If the inferences are not essentially changed, there is little 
need to think of remedial actions for the cases diagnosed as influential. On the other hand, 
serious changes in the inferences drawn from the fitted model when a case is omitted will 
require consideration of remedial measures. 


In the body fat example with two predictor variables, cases 3 and 15 were identified as 
outlying X observations and cases 8,and 13 as outlying Y observations. АП three influence 
measures (DFFITS, Cook's distance, and DFBETAS) identified only case 3 as influential, 
and, indeed, suggested that its influence may be of marginal importance so that remedial 
measures might not be required. 

The analyst in the body fat example was primarily interested in the fit of the regression 
model because the model was intended to be used for making predictions within the range 
of the observations on the predictor variables in the data set. Hence, the analyst considered 


406 PartTwo Multiple Linear Regression 


the fitted regression functions with and without case 3: 


With case 3: Y = —19.174 + .2224Х + .6594 X; 
Without case 3: Ӯ = —12.428 + .5641 X, + .3635X> 


Because of the high multicollinearity between X; and X», the analyst was not SUTPrised 
by the shifts in the magnitudes of b; and b» when case 3 is omitted. Remember that the 
estimated standard deviations of the coefficients, given in Table 7.2c, are very large and 
that a single case can change the estimated coefficients substantially when the predicto, 
variables are highly correlated. 

To examine the effect of case 3 on inferences to be made from the fitted regression 
function in the range of the X observations in a direct fashion, the analyst calculated for 
each of the 20 cases the relative difference between the fitted value Y ; based on all 20 caseg 
and the fitted value Yig) obtained when case 3 is omitted. The measure of interest was the 
average absolute percent difference: 


n 


This mean difference is 3.1 percent; further, 17 of the 20 differences are less than 5 percent 
(calculations not shown). On the basis of this direct evidence about the effect of case 3 on 
the inferences to be made, the analyst was satisfied that case 3 does not exercise undue 
influence so that no remedial action is required for handling this case. 


Some Final Comments 

Analysis of outlying and influential cases is a necessary component of good regression 
analysis. However, it is neither automatic nor foolproof and requires good judgment by the 
analyst. The methods described often work well, but at times are ineffective. For example, 
if two influential outlying cases are nearly coincident, as depicted in Figure 10.5 by cases3 
and 4, an analysis that deletes one case at a time and estimates the change in fit will result in 
virtually no change forthese two outlying cases. The reason is that the retained outlying case 
will mask the effect of the deleted outlying case. Extensions of the single-case diagnostic 
procedures described here have been developed that involve deleting two or more cases 
at a time. However, the computational requirements for these extensions are much more 
demanding than for the single-case diagnostics. Reference 10.4 describes some of these 
extensions. 

Remedial measures for outlying cases that are determined to be highly influential by the 
diagnostic procedures will be discussed in the next chapter. 


10.5 Mulucollinearity Diagnostics— Variance Inflation Factor 


When we discussed multicollinearity in Chapter 7, we noted some key problems that typi- 
cally arise when the predictor variables being considered for the regression model are highly 
correlated among themselves: 


1. Adding or deleting a predictor variable changes the regression coefficients. 


Chapter 10 Building the Regression Model П: Diagnostics 407 


2. The extra sum of squares associated with a predictor variable varies, depending upon 
which other predictor variables are already included in the model. 

3. The estimated standard deviations of the regression coefficients become large when 
the predictor variables in the regression model are highly correlated with each other. 

4. The estimated regression coefficients individually may not be statistically significant 
even though a definite statistical relation exists between the response variable and the set 
of predictor variables. 


These problems can also arise without substantial multicollinearity being present, but only 
under unusual circumstances not likely to be found in practice. 

We first consider some informa! diagnostics for multicollinearity and then a highly useful 
formal diagnostic, the variance inflation factor. 


informal Diagnostics : 


Example 


Indications of the presence of serious multicollinearity are given by the following informal 
diagnostics: 


1. Large changes in the estimated regression coefficients when a predictor variable is added 
or deleted, or when an observation is altered or deleted. 

2. Nonsignificant results in individual tests on the regression coefficients for important 
predictor variables. 

3. Estimated regression coefficients with an algebraic sign that is the opposite of that 
expected from theoretical considerations or prior experience. 

4. Large coefficients of simple correlation between pairs of predictor variables i in the cor- 
relation matrix ry y. 

5. Wide confidence intervals for the regression coefficients representing important predictor 
variables. 


We consider again the body fat example of Table 7.1, this time with all three predictor 
variables—triceps skinfold thickness (X,), thigh circumference (X5), and midarm circum- 
ference (X3). We noted in Chapter 7 that the predictor variables triceps skinfold thickness 
and thigh circumference are highly correlated with each other. We also noted large changes 
in the estimated regression coefficients and their estimated standard deviations when a vari- 
able was added, nonsignificant results in individual tests on anticipated important variables, 
and an estimated negative coefficient when a positive coefficient was expected. These are all 
informal indications that suggest serious multicollinearity among the predictor variables. 


Comment ‘ 


The informal methods just described have important limitations. They do not provide quantitative 
measurements of the impact of multicollinearity and they may not identify the nature of the multi- 
collinearity. For instance, if predictor variables Х|, X2, and Хз have low pairwise correlations, then 
the examination of simple correlation coefficients may not disclose the existence of relations among 
groups of predictor variables, such as a high correlation between X, and a linear combination of X; 
and X3. 

Another limitation of the informal diagnostic methods is that sometimes the observed behavior 
may occur without multicollinearity being present. " 


408 PartTwo Multiple Linear Regression 


Variance Inflation Factor 
A formal method of detecting the presence of multicollinearity that is widely accepted Т 
use of variance inflation factors. These factors measure how much the variances of the 
estimated regression coefficients are inflated as compared to when the predictor variables 
are not linearly related. 

To understand the significance of variance inflation factors, we begin with the Precision 
of least squares estimated regression coefficients, which is measured by their variances. We 
know from (6.46) that the variance-covariance matrix of the estimated regression coeff. 
cients 15: 


e (b) = 0° (XX)! (10.38) 


For purposes of measuring the impact of multicollinearity, it is useful to work with the 
standardized regression model (7.45), which is obtained by transforming the variables by 
means of the correlation transformation (7.44). When the standardized regression model jg 
fitted, the estimated regression coefficients Б? are standardized coefficients that are related 
to the estimated regression coefficients for the untransformed variables according to (7.53). 
The variance-covariance matrix of the estimated standardized regression coefficients is ob- 
tained from (10.38) by using the result in (7.50), which states that the X'X matrix for the 
transformed variables is the correlation matrix of the X variables rx x. Hence, we obtain: 


c^ (b') = (o°) ry (10.39) 


where rx x is the matrix of the pairwise simple correlation coefficients among the X van- 
ables, as defined in (7.47), and (07)? is the error term variance for the transformed model, 

Note from (10.39) that the variance of bz (k = 1,..., p — 1) is equal to the following, 
letting (VIF), denote the kth diagonal element of the matrix rj: 


o {bi} = (о*) (VIF), (10.40) 


The diagonal element (VIF), is called the variance inflation factor (VIF) for b. It can be 
shown that this variance inflation factor is equal to: 


(VIF, =(1- ЕР)!  к=1,2.....р—1 (10.41) 


where Rẹ is the coefficient of multiple determination when X, is regressed on the p – 2 
other X variables in the model. Hence, we have: 
(о)? 


о = I (10.42) 
А 


We presented in (7.65) the special results for o? (b;] when p — | = 2, for which Rẹ} = гу 
the coefficient of simple determination between X, and X». 

The variance inflation factor (VIF), is equal to 1 when Rẹ = 0, i.e., when X, isnot linearly 
related to the other X variables. When Rẹ # 0, then (VIF), is greater than 1, indicating an 
inflated variance for bf as a result of the intercorrelations among the X variables. When Xi 
has a perfect linear association with the other X variables in the model so that R? = 1, then 
(VIF), and c^ (b;] are unbounded. 


ey, 


Example 


TABLE 10.5 
Variance 
Inflation 
Factors—Body 
Fat Example 
With Three 
‘Variables, 


Chapter 10 Building the Regression Model П: Diagnostics 409 


Diagnostic Uses. The largest VIF value among all X variables is often used as an indicator 
of the severity of multicollinearity. A maximum VIF value in excess of 10 is frequently taken 
as an indication that multicollinearity may be unduly influencing the least squares estimates. 

The mean of the VIF values also provides information about the severity of the multi- 
collinearity in terms of how far the estimated standardized regression coefficients bz are 
from the true values Вк. It can be shown that the expected value of the sum of these squared 
errors (by — В*)2 is given by: 


p-l p-l 
E {Ses = zu = (о*) у VIF), (10.43) 
К=1 К=1 


Thus, large VIF values result, on the average, in larger ue dd between the estimated 
and true standardized regression coefficients. 

When no X variable is linearly related to the others in Ше” regression model, R? к = 0; 
hence, (VIF), = 1, their sum is р — 1, and ће expected value of the sum df the squared 
errors 1S: 2 


p-l 
ef Ss — в} —(o*)(p—1) when (VIF), = 1 (10.43a) 


A ratio of the results in (10.43) and (10.43a) provides useful information about the effect 
of multicollinearity on the sum of the squared errors: 


(с*)? У XVIF), =) »XVIF), 
(o*Y (p — 1) p-i 
Note that this ratio is simply the mean of the VIF values, to be denoted by (VIF): . 
£a IP), 
p-1 
Mean VIF values considerably larger than 1 are indicative of serious multicollinearity 
problems. 


(VIF) = (10.44) 


Table 10.5 contains the estimated standardized regression coefficients and the VIF values for 
the body fat example with three predictor variables (calculations not shown). The maximum 
of the VIF values is 708.84 and their mean value is (VIF) = 459.26. Thus, the expected sum 
of the squared errors in the least squares standardized regression coefficients is nearly 460 
times as large as it would be if the X variables were uncorrelated. In addition, all three VIF 
values greatly exceed 10, which again indicates that serious multicollinearity problems exist. 


E 


Variable be (VIB), 
OG 4.2637 * 70884" 

X —2:9287 564.34. 

Хз 1:5614 104261 


Maximum (ИР), = 708.84 (VIF = 459.26. 


410 PartTwo Multiple Linear Regression 


It is interesting to note that (VIF); = 105 despite the fact that both гї, апа r2, is 
Figure 7.3b) are not large. Here is an instance where X; is strongly related to X, sig us 
together (R$ = .990), even though the pairwise coefficients of simple determination are n 


large. Examination of the pairwise correlations does not disclose this multicollinearity 


Comments 


1. Some computer regression programs use the reciprocal of the variance inflation factor to detect 
instances where an X variable should not be allowed into the fitted regression model because of exces. 
sively high interdependence between this variable and the other X variables in the model, Tolerance 
limits for 1/(VIF), = 1 — RÈ frequently used аге .01, .001, or .0001, below which the variable js not 
entered into the model. 

2. A limitation of variance inflation factors for detecting multicollihearities is that they cannot 
distinguish between several simultaneous multicollinearities. 

3. A number of other formal methods for detecting multicollinearity have been proposed. These 
are more complex than variance inflation factors and are discussed in specialized texts such ag Ref. 
erences 10.5 and 10.6. а 


а 


10.6 Surgical Unit Example—Continued 


In Chapter 9 we developed a regression model for the surgical unit example (data jn 
Table 9.1). Recall that validation studies in Section 9.6 led to the selection of model (9,21), 
the model containing variables Х|, X», X4, and Хз. We will now utilize this regression model 
to demonstrate a more in-depth study of curvature, interaction effects, multicollinearity, and 
influential cases using residuals and other diagnostics. 

To examine interaction effects further, a regression model containing first-order terms 
in Xj, X2, Хз, and Хв was fitted and added-variable plots for the six two-factor interaction 
terms, X; X», Х| Хз, XiXg, X2X3, XoXg, and X4Xg, were examined. These plots (not 
shown) did not suggest that any strong two-variable interactions are present and need to be 
included in the model. The absence of any strong interactions was also noted by fitting a 
regression model containing X,, X5, Хз, and Xg in first-order terms and all two-variable 
interaction terms. The P-value of the formal F test statistic (7.19) for dropping all of the 
interaction terms from the model containing both the first-order effects and the interaction 
effects is .35, indicating that interaction effects are not present. 

Figure 10.9 contains some of the additional diagnostic plots that were generated to check 
on the adequacy of the first-order model: 


Y; = Bo + В. Ха + В Хе + ӨХ + PsXis + £i (10.45) 


where Y; = In Y;. The following points are worth noting: 


1. The residual plot against the fitted values in Figure 10.9a shows no evidence of serious 
departures from the model. 

2. One of the three candidate models (9.23) subjected to validation studies in Section 9.6 
contained Xs (patient age) as a predictor. The regression coefficient for age (bs) was negative 
in model (9.23), but when the same model was fit to the validation data, the sign of bs became 
positive. We will now use a residual plot and an added-variable plot to study graphically 


Regression 
Model (10.45). 


Chapter 10 Building the Regression Model П: Diagnostics 411 


(a) Residual Plot against Predicted (b) Residual Plot against Xs 


Residual 
Residual 


5 5.5 6 6.5 7 7.5 
Predícted Value 


(c) Added-Variable Plot for X; 


e(Y’|X1, Xz, Хз, Хе) 
Residual 


-20 -10 0 10 20 
e(Xs|X,, Хә, Хз, Хв) Expected Value 


the strength of the marginal relationship between Х; and the response, when Ху, Хә, Хз, 
and Хз are already in the model. Figure 10.9b shows the plot of the residuals for the model 
containing Ху, X5, Хз, and Xg against Xs, the predictor variable not in the model. This 
plot shows no need to include patient age (X5) in the model to predict logarithm of survival 
time. A better view of this marginal relationship is provided by the added-variable plot in 
Figure 10.9c. The slope coefficient bs can be seen again to be slightly negative as depicted 
by the solid line in the added-variable plot. Overall, however, the marginal relationship 
between Xs and Y' is weak. The P-value of the formal ¢ test (9.18) for dropping Xs from 
the model containing X,, Xz, Хз, X5 апа Xg is 0.194. In addition, the plot shows that the 
negative slope is driven largely by one or two outliers—one in the upper left region of 
the plot, and one in the lower right region. In'this way the added-variable plot provides 
additional support for dropping X5, > 


3. The normal probability plot ofthe residuals in Figure 10.9d shows little departure from 
linearity. The coefficient of correlation between the ordered residuals and their expected 


values under normality is .982, which is larger than the critical value for significance level .05 
in Table B.6. 


412 PartTwo Multiple Linear Regression 


Multicollinearity was studied by calculating the variance inflation factors: 


Variable (VIF), 


Xi 1.10 
X2 1.02 
Хз 1.05 
Хв 1.09 


As may be seen from these results, multicollinearity among the four predictor variables jg 
not a problem. 

Figure 10.10 contains index plots of four key regression diagnostics, namely the deleteg 
studentized residuals t; in Figure 10.10a. the leverage values h; in Figure 10.10b, Cook’s 
distances D; in Figure 10.10c, and DFFITS; values in Figure 10.10d. These plots suggest 
further study of cases 17, 28, and 38, Table 10.6 lists numerical diagnostic values for 
these cases. The measures presented in columns 1-5 are the residuals e; in (10.8), the 
studentized deleted residuals t; in (10.24), the leverage values Aj; in (10.18), the Cook's 
distance measures D; in (10.33), and the (DFFITS); values in (10.30). The following are 
noteworthy points about the diagnostics in Table 10.6: 


1. Case 17 was identified as outlying with regard to its Y value according to its studentized 
deleted residual, outlying by more than three standard deviations. We test formally whether 
case 17 is outlying by means of the Bonferroni test procedure. For a family significance 
level of æ = .05 and sample size n = 54. we require t (1 —&/2n; n — p — 1) = 1(.99954; 49) 
= 3.528. Since |t;;| = 3.3696 < 3.528, the formal outlier test indicates that case 22 is not 
an outlier. Still, г is very close to the critical value, and although this case does not appear 
to be outlying to any substantial extent, we may wish to investigate the influence of case 17 
to remove any doubts. 

2. With 2p/n — 2(5)/54 — .185 as a guide for identifying outlying X observations, 
cases 23, 28, 32, 38, 42, and 52 were identified as outlying according to their leverage 
values. Incidentally, the univariate dot plots identify only cases 28 and 38 as outlying. Here 
we see the value of multivariable outlier identification. 

3, То determine the influence of cases 17, 23, 28, 32, 38, 42, 32, and 52, we consider their 
Cook's distance and DFFITS values. According to each of these measures, case 17 is the 
most influential, with Cook's distance D,; = .3306 and (DFFITS);; = 1.4151. Referring 
to the F distribution with 5 and 49 degrees of freedom, we note that the Cook's value 
corresponds to the 11th percentile. It thus appears that the influence of case 38 is not large 
enough to warrant remedial measures, and consequently the other outlying cases also do 
not appear to be overly influential. 

A direct check of the influence of case 17 on the inferences of interest was also conducted. 
Here, the inferences of primary interest are in the fit of the regression model because the 
model is intended to be used for making predictions in the range of the X observations. 
Hence, each fitted value Ӯ, based on all 54 observations was compared with the fitted value 
Ў ju when case 17 is deleted in fitting the regression model. The average of the absolute 
percent differences: 


[ans 


i 


Chapter 10 Building the Regression Model П: Diagnostics 413 


FIGURE 10.10 Diagnostic Plots for Surgical Unit Example—Regression Model (10.45). 


(a) Studentized Deleted Residuals (b) Leverage Values 
4 0.35 
3 0.3 
2 0.25 
1 0.2 

s c 
0 0.15 
-1 0.1 
—2 0.05 
=3 0 
10 20 30 40 50 10 20 30 40 50 
Case Index Case Index 
y 
(c) Cook's Distance (d) DFFITS Values 


10 20 30 40 50 
Case Index Case Index 


TABLE 10.6 х те 

Various (2) + (3), (4) (5) 

Diagnostics for Зд 

Outlyi 

Сек ‘(DFFITS); 

Surgical Unit 14151 

Example, 0.7160 

Regression :0:3140 

Model (10.45). —0.8283 
0.8641. 
0.0876 


—0.3931 


414 PartTwo Multiple Linear Regression 


is only .42 percent. and the largest absolute percent difference (which is for case 17) jg only 
1.77 percent. Thus, case 17 does not have such a disproportionate influence on the fitted 
values that remedial action would be required. 

4. In summary, the diagnostic analyses identified a number of potential problems, but 
none of these was considered to be serious enough to require further remedial action. 


Cited 
References 


10.1. Atkinson, А.С. Plots, Trausforinations, and Regression. Oxford: Clarendon Press, 1987. 

10.2. Mansfield, E. R., and M. D. Conerly. “Diagnostic Value of Residual and Partial Residual Plots” 
The American Statistician 4| (1987). pp. 107-16. 

10.3. Cook, R. D. “Exploring Partial Residual Plots,” Technometrics 35 (1993). pp. 351-62, 

10.4. Rousseeuw, P. J., and A. M. Leroy. Robust Regression and Outlier Detection. New York: John 
Wiley & Sons, 1987. 

10.5. Belsley, D. A.; E. Kuh; and R. E. Welsch. Regression Diagnostics: Identifving Influential Data 
and Sources of Collinearity. New York: John Wiley & Sons, 1980. 

10.6. Belsley, D. A. Conditioning Diagnostics: CoHinearity and Weak Data in Regression. New York. 
John Wiley & Sons, 1991. 


Problems 


10.1. A student asked: "Why is it necessary to perform diagnostic checks of the fit when R? is 
large?" Comment. 


10.2. A researcher stated: "One good thing about added-variable plots is that they are extremely 
useful for identifying model adequacy even when the predictor variables are not properly 
specified in the regression model." Comment. 


10.3. A student suggested: "If extremely influential outlying cases are detected in a data set, simply 
discard these cases from the data set." Comment. 


10.4. Describe several informal methods that can be helpful in identifying multicollinearity among 
the X variables in a multiple regression model. 


10.5. Refer to Brand preference Problem 6.5b. 
a. Prepare an added-variable plot for each of the predictor variables. 
b. Do your plots in part (a) suggest that the regression relationships in the fitted regression 
function in Problem 6.5b are inappropriate for any of the predictor variables? Explain. 
c. Obtain the fitted regression function in Problem 6.5b by separately regressing both Y and 
X» on Х|, and then regressing the residuals in an appropriate fashion. 
10.6. Refer to Grocery retailer Problem 6.9. 
a. Fit regression model (6.1) to the data using X, and X» only. 
b. Prepare an added-variable plot for each of the predictor variables X, and X». 
c. Do your plots in part (a) suggest that the regression relationships in the fitted regression 
function in part (a) are inappropriate for any of the predictor variables? Explain. 
d. Obtain the fitted regression function in part (a) by separately regressing both Y and X» on 
Х|, and then regressing the residuals in an appropriate fashion. 
10.7. Refer to Patient satisfaction Problem 6.! 5c. 
a. Prepare an added-variable plot for each of the predictor variables. 
b. Do your plots in part (a) suggest that the regression relationships in the titted regression 
function in Problem 6.15c are inappropriate for any of the predictor variables? Explain. 


10.8. 


10.9. 


*10.10. 


*10.11. 


Chapter 10 Building the Regression Model П: Diagnostics 415 


Refer to Commercial properties Problem 6.18c. 

a. Prepare an added-variable plot for each of the predictor variables. 

b. Do your plots in part (a) suggest that the regression relationships in the fitted regres- 
sion function in Problem 6.18c are inappropriate for any of the predictor variables? 
Explain. 

Refer to Brand preference Problem 6.5. 

a. Obtain the studentized deleted residuals and identify any outlying Y observations. Use the 
Bonferroni outlier test procedure with a = .10. State the decision rule and conclusion. 

b. Obtain the diagonal elements of the hat matrix, and provide an explanation for the pattern 
in these elements. 

c. Are any of the observations outlying with regard to their X values according to the rule of 
thumb stated in the chapter? 3 

d. Management wishes to estimate the mean degree of brand liking for moisture content 
X, = 10 and sweetness X? = 3. Construct a scatter plot of Ж» against Х| hnd determine 
visually whether this prediction involves an extrapolation beyond the range of the data. 
Also, use (10.29) to determine whether an extrapolation is involved. Do your conclusions 
from the two methods agree? 

e. The largest absolute studentized deleted residual is for case 14. Obtain the DFFITS, 
DFBETAS, and Cook's distance values for this case to assess the influence of this case. 
What do you conclude? 

f. Calculate the average absolute percent difference in the fitted values with and without 
case 14. What does this measure indicate about the influence of case 14? 

g. Calculate Cook's distance D; for each case and prepare an index plot. Are any cases 
influential according to this measure? 

Refer to Grocery retailer Problems 6.9 and 6.10. 

a. Obtain the studentized deleted residuals and identify any outlying Y observations. Use the 
Bonferroni outlier test procedure with о = .05. State the decision rule and conclusion. 

b. Obtain the diagonal element of the hat matrix. Identify any outlying X observations using 
the rule of thumb presented in the chapter. 

c. Management wishes to predict the total labor hours required to handle the next shipment 
containing X, = 300,000 cases whose indirect costs of the total hours is X; = 7.2 and 
Хз = 0 (no holiday in week). Construct a scatter plot of X? against X, and determine 
visually whether this prediction involves an extrapolation beyond the range of the data. 
Also, use (10.29) to determine whether an extrapolation is involved. Do your conclusions 
from the two methods agrec? 

d. Cases 16, 22, 43, and 48 appear to be outlying X observations, and cases 10, 32, 38, and 40 
appear to be outlying Y observations. Obtain the DFFITS, DFBETAS, and Cook's distance 
values for each of these cases to assess their influence. What do you conclude? 

€. Calculate the average absolute percent difference in the fitted values with and without each 
of these cases. What does this measure indicate about the influence of each of the cases? 

f. Calculate Cook's distance D; for each case and prepare an index plot. Are any cases 
influential according to this measure? = 

Refer to Patient satisfaction Problem 6.15. 

a. Obtain the studentized deleted residuals and identify any outlying Y observations. Use the 
Bonferroni outlier test procedure with œ = .10. State the decision rule and conclusion. 

b. Obtain the diagonal elements of the hat matrix. Identify any outlying X observations. 


416 PartTwo Multiple Linear Regression 


c. 


Hospital management wishes to estimate mean patient satisfaction for patients who ate 
X, = 30 years old, whose index of illness severity is X» = 58, and whose index of anxi 
level is X; = 2.0. Use (10.29) to determine whether this estimate will involve a hidden 
extrapolation. 

The three largest absolute studentized deleted residuals are for cases 11. 17, and 27. Obtain 
the DFFITS, DFBETAS. and Cook's distance values for this case to assess its influence 
What do you conclude? 

Calculate the average absolute percent difference in the fitted values with and without each 
of these cases. What does this measure indicate about the influence of each of these cases? 
Calculate Cook's distance D; for each case and prepare an index plot. Are апу cases 
influential according to this measure? 


10.12. Refer to Commercial Properties Problem 6.18. 


10.13. 


а. 


Obtain the studentized deleted residuals and identify any outlying Y observations. Use the 
Bonferroni outlier test procedure with о = .01. State the decision rule and conclusion, 


Obtain the diagonal elements of the hat matrix. Identify any outlying X observations, 


. The researcher wishes to estimate the rental rates of a property whose age is 10 years, 


whose operating expenses and taxes are 12.00, whose ocupancy rate is 0.05, and whose 
square footage is 350,000. Use (10.29) to determine whether this estimate will involve a 
hidden extrapolation. 

Cases 61, 8, 3, and 53 appear to be outlying X observations, and cases 6 and 62 appear 
to be outlying Y observations. Obtain the DFFITS, DFBETAS, and Cook's distance values 
for each case to assess its influence. What do you conclude? 


. Calculate the average absolute percent difference in the fitted values with and without each 


of the cases. What does this measure indicate about the influence of each case? 


. Calculate Cook's distance D; for each case and prepare an index plot. Are any cases 


influential according to this measure? 


Cosmetics sales. An assistant in the district sales office of a national cosmetics firm obtained 
data, shown below, on advertising expenditures and sales last year in the district's 44 territories. 
X denotes expenditures for point-of-sale displays in beauty salons and department stores (in 
thousand dollars), and X» and X3 represent the corresponding expenditures for local media 
advertising and prorated share of national media advertising, respectively. Y denotes sales (in 
thousand cases). The assistant was instructed to estimate the increase in expected sales when 
X, is increased by | thousand dollars and X» and X; are held constant, and was told to use 
an ordinary multiple regression model with linear terms for the predictor variables and with 
independent normal error terms. 


a. 


b. 


с. 


i: 1 2 3 53 42 43 24 
Xn: 5.6 4.1 3.7 zs 3.6 3.9 5.5 
Xiz: 5.6 4.8 3.5 Tea 3.7 3.6 5.0 
Хз: 3.8 4.8 3.6 ВЕЕ 44 2.9 5.5 

Yi: 12.85 11.55 12.78 ed 10.47 11.03 12.31 


State the regression model to be employed and fit it to the data. 

Test whether there is a regression relation between sales and the three predictor variables 
use a = .05. State the alternatives, decision rule, and conclusion. 

Test for each of the regression coefficients fj (k =!. 2. 3) individually whether or not 
Pi = О; use œ = .05 each time. Do the conclusions of these tests correspond to that obtained 
in part (b)? 


10.14. 


10.15. 


*10.16. 


*10.17. 


10.18. 


10.19. 


Chapter 10 Building the Regression Model II: Diagnostics 417 


d. Obtain the correlation matrix of the X variables. 

e. What do the results in parts (Ы), (c), and (d) suggest about the suitability of the data for the 
research objective? 

Refer to Cosmetics sales Problem 10.13. 

a. Obtain the three variance inflation factors. What do these suggest about the effects of 
multicollinearity here? 

b. The assistant eventually decided to drop variables X2 and Хз from the model “to clear up 
the picture." Fit the assistant's revised model. Is the assistant now in a better position to 
achieve the research objective? 

c. Why would an experiment here be more effective in providing suitable data to meet the 
research objective? How would you design such an experiment? What regression model 
would you employ? 

Refer to Brand preference Problem 6.5a. 

a. What do the scatter plot matrix and the correlation matrix show about bairwise linear 
associations among the predictor variables? ~ 

b. Find the two variance inflation factors. Why are they both equal to 1? 

Refer to Grocery retailer Problem 6.9c. 

a. What do the scatter plot matrix and the correlation matrix show about pairwise linear 
associations among the predictor variables? 

b. Find the three variance inflation factors. Do they indicate that a serious multicollinearity 
problem exists here? 

Refer to Patient satisfaction Problem 6.1 5b. 

а. What do the scatter plot matrix and the correlation matrix showy about pairwisé linear 
associations among the predictor variables? 


b. Obtain the three variance inflation factors. What do these results suggest about the effects 
of multicollinearity here? Are these results more revealing than those in part (a)? 


Refer to Commercial properties Problem 6.1 8b. 


a. What do the scatter plot matrix and the correlation matrix show about pairwise linear 
associations among the predictor variables? 

b. Obtain the four variance inflation factors. Do they indicate that a serious multicoilinearity 
problem exists here? 


Referto Job proficiency Problems 9.10 and 9.11. The subset model containing only first-order 
terms in X, and X; is to be evaluated in detail. 


a. Obtain the residuals and plot them separately against Ў, each of the four predictor variables, 
and the cross-product term X, Хз. On the basis of these plots, should any modifications in 
the regression model be investigated? 

b. Prepare separate added-variable plots against e(X,|X3) and e(X3|X,). Do these plots 
suggest that any modifications in the model form are warranted? 

c. Prepare a normal probability plot of the residuals. Also obtain the coefficient of corre- 
lation between the ordered residuals and their expected values under normality. Test the 
reasonableness of the norntality assumptions, using Table В.б and œ = .01. What do you 
conclude? А 

d. Obtain the studentized deleted residuals and identify any outlying Y observations. 
Use the Bonferroni outlier test procedure with о = .05. State the decision rule and 
conclusion. 


418 PartTwo Multiple Linear Regression 


10.20. 


*10.21. 


*10.22. 


е. 


o 


Obtain the diagonal elements of the hai matrix. Using the rule of ihumb in the text, ideni 
any outlying X observations. Are your findings consistent with those in Problem 9. 1029 
Should they be? Comment. 


. Cases 7 and 18 appear to be moderately outlying with respect to their X values, and 


case 16 is reasonably far outlying with respect to its Y value. Obtain DFFITS, DFBETAS, 
and Cook's distance values for these cases to assess their influence. What do you conclude? 


g. Obtain the variance inflation factors. What Чо they indicate? 


Refer to Lung pressure Problems 9.13 and 9.14. The subset regression model contain- 
ing first-order terms for X, and X» and the cross-product term X, X» is to be evaluated in 
detail. 


a. 


d. 


Obiain the residuals and plot them separately against Y and each of the three predicto 
variables. On the basis of these plots. should any further modifications of the regression 
model be attempted? 

Prepare a normal probability plor of the residuals. Also obtain the coefficient of correlation 
between the ordered residuals and their expected values under normality. Does the hormality 
assumption appear to be reasonable here? 


. Obtain the variance inflation factors. Are there any indications that serious multicollinearity 


problems are present? Explain. 

Obtain the studentized deleted residuals and identify any outlying Y observations. Use the 
Bonferroni outlier test procedure with œ = .05. Siate the decision rule and conclusion, 
Obtain the diagonal elements of the hat matrix. Using the rule of thumb in the text, identify 
any outlying X observations, Are your findings consistent with those in Problem 9.133? 
Should they be? Discuss. 

Cases 3, 8, and 15 are moderaiely far outlying with respect to their X values, and case 7 is 
relatively far outlying with respect to its Y value. Obtain DFFITS, DFBETAS, and Cook's 
distance values for these cases to assess their influence. What do you conclude? 


Refer to Kidney function Problem 9,15 and the regression model fitted in part (c). 


a. 


d. 


Obtain the variance inflation factors. Are there indications that serious multicollinearity 
problems exist here? Explain. 


. Obtain the residuals and plot them separately against Y and each of the predictor variables. 


Also prepare a normal probability plot of the residuals. 


. Prepare separate added-variable plors against e(X)|X2, X3), e(Xo|X4. Хз), and 


e(X5| Xi, Хэ). 
Do the plots in parts (b) and (c) suggest that the regression model should be modified? 


Refer to Kidney function Problems 9.15 and 10.21. Theoretical arguments suggest use of the 
following regression function: 


d. 


E(In Y} = By + By In X + fo In(140 — X3) + fs In X3 


Fit the regression function based on theoretical considerations. 

Obtain the residuals and plot them separately against Y and each predictor variable in the 
fitted model. Also prepare a normal probability plot of the residuals. Have the difficulties 
noied ín Problem 10.21 now largely been eliminated? 


. Obtain the variance inflaiion factors. Are there indications that serious muhicollinearity 


problems exist here? Explain. 
Obtain the studentized deleted residuals and identify any outlying Y observations. Use the 
Bonferroni outlier test procedure with o = .10, State the decision rule and conclusion. 


Exercises 


Projects 


10.23. 
10.24. 


10.25. 
10.26. 


Chapter 10 Building the Regression Model П: Diagnostics 419 


e. Obtainthe diagonal elements of the hat matrix. Using tbe rule of thumb in the text, identify 
any outlying X observations. 

f. Cases 28 and 29 are relatively far outlying with respect to their Y values. Obtain DFFITS, 
DFBETAS, and Cook's distance values for these cases to assess their influence. What do 
you conclude? 


Show that (10.37) is algebraically equivalent to (10.332). 

If n = p and the X matrix is invertible, use (5.34) and (5.37) to show that the hat matrix H is 
given by the p х p identity matrix. In this case, what are Р; and Y;? 

Show that (10.26) follows from (10.24a) and (10.25). * 

Prove (9.11), using (10.27) and Exercise 5.31. E 


i 


10.27. 


10.28. 


Refer to the SENIC data set in Appendix C.1 and Project 9.25. The regression model containing 
age, routine chest X-ray ratio, and average daily census in first-order terms is to be evaluated 
in detail based on the model-building data set. 


a. Obtain the residuals and plot them separately against Ӯ, each of the predictor variables in 
the model, and each of the related cross-product terms. On the basis of these plots, should 
any modifications of the model be made? 

b. Prepare a normal probability plot of the residuals. Also obtain the coefficient of corre- 
lation between the ordered residuals and their expected values under normality. Test the 
reasonableness of the normality assumption, using Table B.6 and œ = .05. What do you 
conclude? : - P 

c. Obtain the scatter plot matrix, the correlation matrix of the X variables, and the variance 
inflation factors. Are there any indications that serious multicollinearity problems are 
present? Explain. 

d. Obtain the studentized deleted residuals and prepare a dot plot of these residuals. Are any 
outliers present? Use the Bonferroni outlier test procedure with œ = .01. State the decision 
tule and conclusion. 

е. Obtain the diagonal elements of the hat matrix. Using the rule of thumb in the text, identify 
any outlying X observations. 

f. Cases 62, 75, 106, and 112 are moderately outlying with respect to their X values, and 
case 87 is reasonably far outlying with respect to its Y value. Obtain DFFITS, DFBETAS, 
and Cook's distance values for these cases to assess their influence. What do you 
conclude? 


Refer to the CDI data set in Appendix C.2 and Project 9.26. The regression model containing 
variables 6, 8, 9, 13, 14, and 15 in'first-order terms is to be evaluated in detail based on the 
model-building data set. ~ 


a. Obtain the residuals and plot them separately against f , each predictor variable in the model, 
and the related cross-product term. On the basis of these plots, should any modifications 
in the model be made? Е Е 

b. Prepare a normal probability plot of the residuals. Also obtain the coefficient of corre- 
lation between the ordered residuals and their expected values under normality. Test the 
reasonableness of the normality assumption, using Table В.б and о = .01. What do you 
conclude? 


420 PartTwo Multiple Linear Regression 


c. Obtain the scatter plot matrix, the correlation matrix of the X variables, and the Variance 
inflation factors. Are there any indications that serious multicollinearity problems are 
present? Explain. 


4. Obtain the studentized deleted residuals and prepare a dot plot of these residuals. Are any 


outliers present? Use the Bonferroni outlier test procedure with a = .05. State the decision 
rule and conclusion. 

e. Obtain the diagonal elements of the hat matrix. Using the rule of thumb in the text, identify 
any outlying X observations. 

f. Cases 2, 8, 48, 128, 206, and 404 are outlying with respect to their X values, and cases 2 
and 6 are reasonably far outlying with respect to their Y values. Obtain DFFITS, DFBETAS 
and Cook’s distance values for these cases to assess their ithe Woar do you conclude? 


Case 
Studies 


10.29. 


10.30. 


10.31. 


Refer to the Website developer data set in Appendix С.б and Case Study 9.29. For the beg 
subset model developed in Case Study 9.29, perform appropriate diagnostic checks to evaluate 
outliers and assess their influence. Do any sexious multicollinearity problems exist here? 
Refer to the Prostate cancer data set in Appendix C.5 and Case Study 9.30. For the best 
subset model developed in Case Study 9.30, perform appropriate diagnostic checks to evaluate 
outliers and assess their influence. Do any serious multicollinearity problems exist here? 
Refer to the Real estate data set in Appendix C.7 and Case Study 9.31. For the best sub- 
set model developed in Case Study 9.31, perform appropriate diagnostic checks to evaluate 
outliers and assess their influence. Do any serious multicollinearity problems exist here? 


Chapter 


Building the Regression 
Model III: Remedial: 


Measures | 1 


When the diagnostics indicate that a regression model is not appropriate or that one or sev- 
eral cases are very influential, remedial measures may need to be taken. In earlier chapters, 
we discussed some remedial measures, such as transformations to linearize the regression 
relation, to make the error distributions more nearly normal, or to make the variances of 
the error terms more nearly equal. In this chapter, we take up some additional remedial 
measures to deal with unequal error variances, a high degree of multicollinearity, and influ- 
ential observations. We next consider two methods for nonparametric regression in detail, 

lowess and regression trees. Since these remedial measures and alternative approaches of- 
ten involve relatively complex estimation procedures, we consider next a general approach, 

called bootstrapping, for evaluating the precision of these complex estimators. We con- 
clude the chapter by presenting a case that illustrates Some of the issues that arise in model 
building. 


11.1 Unequal Error Variances Remedial Measures— Weighted 
Least Squares 


We explained in Chapters 3 and 6 how transformations of Y may be helpful in reducing or 
eliminating unequal variances of the error terms. A difficulty with transformations of Yis that 
they may create an inappropriate regression relationship. When an appropriate regression 
relationship has been found'but the variances of the error terms are unequal, an alternative 
to transformations is weighted least squares,,a procedure based on a generalization of 
multiple regression model (6.7). We shall now denote the variance of the error term €; by 
оў to recognize that different errór terms may have different variances. The generalized 
multiple regression model can then be expressed as follows: 


Y; = В + iX +--+ + Bp Xip- + ё (17.1) 
421 


422 PartTwo Multiple Linear Regression 


where: 
Ва. Вл... -, Bp-i are parameters 
Жыз Xj, p- 1 are known constants 


є; are independent N (0, 07) 
ij—]l.....n 


The variance-covariance matrix of the error terms for the generalized multiple regression 
model (11.1) is more complex than before: 


o; 0 0 

5 о; 0 
pus (11.2) 

0 0 .— g 


The estimation of the regression coefficients in generalized model (11.1) could be done by 

using the estimators in (6.25) for regression model (6.7) with equal error variances. These 

estimators are still unbiased and consistent for generalized regression model (11.1), but they 

no longer have minimum variance. To obtain unbiased estimators with minimum variance, 

we must take into account that the different Y observations for the п cases no longer have 

the same reliability. Observations with small variances provide more reliable information 

about the regression function than those with large variances. We shall first consider the 

estimation of the regression coefficients when the error variances с? are known. This саў. 
is usually unrealistic. but it provides guidance as to how to proceed when the error variances 

are not known. 


Error Variances Known 
When the error variances о аге known, we can use the method of maximum likelihood to 
obtain estimators of the regression coefficients in generalized regression model (11.1). The 
likelihood function in (6.26) for the case of equal error variances o? is modified by replacing 
the o? terms with the respective variances с? and expressing the likelihood function in the. 
first form of (1.26): 


н 


1 1 2 
3 o=] 2024 Bo — В. Хи · (ЖО (11.3) 


i=l 


where В as usual denotes the vector of the regression coefficients. We define the reciprocal 
of the variance ог as the weight w;: 
1 
wi = —у (114) 
ог 


E 


We can then express the likelihood function (11.3) as follows, after making some 
simplifications: 


п 
ш 


в 1I (St) e| У p- pixa = Bp rXip a] 019 


i=l i=l 


Chapter 11 Building the Regression Model III: Remedial Measures 423 


We find the maximum likelihood estimators of the regression coefficients by maximizing 
ІВ) in (11.5) with respect to Во, B,, . .., Bp—1. Since the error variances o? and hence 
the weights w; are assumed to be known, maximizing L(B) with respect to the regression 
coefficients is equivalent to minimizing the exponential term: 


n 
Qv =} w – В – Pi Xi = -+ = BpaXip-i)? (11.6) 
i=l 
This term to be minimized for obtaining the maximum likelihood estimators is also the 
weighted least squares criterion, denoted by Q,,. Thus, the methods of maximum likeli- 
hood and weighted least squares lead to the same estimators for the generalized multiple 
regression model (11.1), as is also the case for the ordinary multiple Tegression model (6.7). 
Note how the weighted least squares criterion (11.6) generalizes thé ordinary least squares 
criterion in (6.22) by replacing equal weights of 1 by w;. Since the weight ш; is inversely 
related to the variance оў, it reflects the amount of information contained in the observa- 
tion Y;. Thus, an observation Y; that has a large variance receives less weight than another 
observation that has a smaller variance. Intuitively, this is reasonable. The more precise is 
Y, (i.e., the smaller is o2), the more information Y; provides about E(Y;) and therefore the 
more weight it should receive in fitting the regression function. 
Itis easiest to express the maximum likelihood and weighted least squares estimators of 
the regression coefficients for model (11.1) in matrix terms. Let the matrix W be a diagonal 
matrix containing the weights w;: 


0 Ww, e 0 
ҮҮ = |. : : - (11.7) 
nxn a> re ; 
0 0 -> Wh 
The normal equations can then be expressed as follows: 77 
(X'WX)b,, = X'WY (11.8) 


and the weighted least squares and maximum likelihood estimators of the regression coef- 
ficients are: 
b, = (X'WX) ХҮҮ (11.9) 
рх\ 
where b,, is the vector of the estimated regression coefficients obtained by weighted least 
squares. The variance-covariance matrix of the weighted least squares estimated regression 
coefficients is: z 
~ 07{b,,} = (X WX) ' (11.10) 


рхр 


Note that this variance-covariance matrix is known since the variances o? are assumed to 
be known. 3 ii 

The weighted least squares and maximum likelihood estimators of the regression co- 
efficients in (11.9) are unbiased, consistent, and have minimum variance among unbiased 
linear estimators. Thus, when the weights are known, b,, generally exhibits less variability 
than the ordinary least squares estimator b. 


424 PartTwo Multiple Linear Regression 


Many computer regression packages will provide the weighted least squares estimated 
regression coefficients. The user simply needs to provide the weights w;. 


Error Variances Known up to Proportionality Constant 
We now relax the requirement that the variances 0? are known by considering the Case 
where only the relative magnitudes of the variances are known. For instance, if we know 
that of is twice as large as оў, we might use the weights шу = 1, шә = 1/2 . In that Case. 
the relative weights w; are a constant multiple of the unknown true weights 1/02: 


1 
Ui = (5) " (11.11) 


i 
where k is the proportionality constant. It can be shown that the weighted least squares and 
maximum likelihood estimators are unaffected by the unknown proportionality constant f 
and are still given by (11.9). The reason is that the proportionality constant К appears оп 
both sides of the normal equations (11.8) and cancels out. The variance-covariance matrix 
of the weighted least squares regression coefficients is now as follows: 
2 _1 
9 {by} = K(X'WX) (11.12) 
pxp 
This matrix is unknown because the proportionality constant k is not known. It can be 
estimated, however. The estimated varlance-covariance matrix of the regression coefficients 
b, is: 
2 -1 
S (b) = MSE, (X'WX) (11.13) 


pxp 


where MSE,, is based on the weighted squared residuals: 


КЫ fy Se! 
MSE, = wii = Y _ awe; (11.133) 


n—p n— p 


Thus, MSE,, here is an estimator of the proportionality constant К. 


Error Variances Unknown 
If the variances c? were known, or even known up to a proportionality constant, the use of 
weighted least squares with weights w, would be straightforward. Unfortunately, one rarely 
has knowledge of the variances с2. We are then forced to use estimates of the variances. 
These can be obtained in a variety of ways. We discuss two methods of obtaining estimates 
of the variances 07. 


Estimation of Variance Function or Standard Deviation Function. The first method 
of obtaining estimates of the error term variances о? is based on empirical findings that the 
magnitudes of оў and o; often vary in aregular fashion with one or several predictor variables 
X, or with the mean response E (Y; ). Figure 3.4c, for example, shows atypical “megaphone” 
prototype residual plot where o? increases as the predictor variable X becomes larger. Such 
a relationship between 0? and one or several predictor variables can be estimated because 
the squared residual e? obtained from an ordinary least squares regression fit is an estimate 
of 02, provided that the regression function is appropriate. We know from (A.15a) hat 


Chapter 11 Building the Regression Model III: Remedial Measures 425 


the variance of the error term є;, denoted by оў, can be expressed as follows: 
o? = Els) — (Ete (11.14) 
Since E[e;] = 0 according to the regression model, we obtain: 
o? = E(£) (11.15) 


Hence, the squared residual e? is an estimator of o2. Furthermore, the absolute residual |e; | 
is an estimator of the standard deviation o;, since о; = 4/02 |. 

We can therefore estimate the variance function describing the relation of o? to relevant 
predictor variables by first fitting the regression model using unweighted least squares 
and then regressing the squared residuals e? against the appropriate predictor variables. 
Alternatively, we can estimate the standard deviation function describing the relation of 
о; to relevant predictor variables by regressing the absolute residuals |e;| obtained from 
fitting the regression model using unweighted least squares against the appropriate predictor 
variables. If there are any outliers in the data, itis generally advisable to estimate the standard 
deviation function rather than the variance function, because regressing absolute residuals 
is less affected by outliers than regressing squared residuals. Reference 11.1 provides a 
detailed discussion of the issues encountered in estimating variance and standard deviation 
functions. 

We illustrate the use of some possible variance and standard deviation functions: 


1. A residual plot against X, exhibits a megaphone shape. Regress the absolute ка 
against X,. 

2. A residual plot against Ў exhibits a megaphone shape. Regress the absolute residuals 
against y. 

3. A plot of the squared residuals against X4 exhibits an upward кп, Regress the 
squared residuals against Хз. = 

4. A plot of the residuals against X; suggests that the variance increases rapidly with 
increases in X5 up to a point and then increases more slowly. Regress the absolute 
residuals against Хә and X2. 


After the variance function or the standard deviation function is estimated, the fitted 
values from this function are used to obtain the estimated weights: 


Ww; = where $; is fitted value from standard deviation function (11.162) 


у 
G)? 


w = > where $; is fitted value from variance function (11.16b) 


A^ 


ш 
The estimated weights аге then placed in the weight matrix W in (11.7) and the estimated 
regression Coefficients are obtained by (11.9), as follows: 
= (XWX)'X'WY (11.17) 


The weighted error mean square MSE,, may be viewed here as an estimator of the propor- 
tionality constant К in (11.11). If the modeling of the variance or standard deviation function 
is done well, the proportionality constant will be near 1 and MSE,, should then be near 1. 


426 PartTwo Multiple Linear Regression 


We summarize the estimation process: 


. Fit the regression model by unweighted least squares and analyze the residuals. 

2. Estimate the variance function or the standard deviation function by regressing either 
the squared residuals or the absolute residuals on the appropriate predictor(s). 

3. Use the fitted values from the estimated variance or standard deviation function to obtain 
the weights ш;. 

4. Estimate the regression coefficients using these weights. 


If the estimated coefficients differ substantially from the estimated regression coefficients 
obtained by ordinary least squares, it is usually advisable to iterate the weighted least squares 
process by using the residuals from the weighted least squares fit to reestimate the variance 
or standard deviation function and then obtain revised weights: Often one or two iterations 
are sufficient to stabilize the estimated regression coefficients. This iteration process is often 
called iteratively reweighted least squares. 


Use of Replicates or Near Replicates. A second method of obtaining estimates of the 
error term variances о? can be utilized in designed experiments where replicate observa. 
tions are made at each combination of levels of the predictor variables. If the number of 
replications is large, the weights w; may be obtained directly from the sample variances of 
the Y observations at each combination of levels of the X variables. Otherwise, the sample 
variances or sample standard deviations should first be regressed against appropriate pre- 
dictor variables to estimate the variance or standard deviation function, from which the 
weights can then be obtained. Note that each case in a replicate group receives the same 
weight with this method. 

In observational studies, replicate observations often are not present. Near replicates may 
then be used. For example, if the residual plot against X, shows a megaphone appearance, 
cases with similar X, values can be grouped together and the variance of the residuals in 
each group calculated. The reciprocals of these variances are then used as the weights ш 
if the number of replications is large. Otherwise, a variance or standard deviation function 
may be estimated to obtain the weights. Again. all cases in a near-replicate group receive 
the same weight. If the estimated regression coefficients differ substantially from those 
obtained with ordinary least squares, the procedure may be iterated, as when an estimated 
variance or standard deviation function is used. 


Inference Procedures when Weights Are Estimated. When the error variances o7 


unknown so that the weights w, need to be estimated, which almost always is the case, 
the variance-covariance matrix of the estimated regression coefficients is usually estimated 
by means of (11.13), using the estimated weights, provided the sample size is not vely 
small. Confidence intervals for regression coefficients are then obtained by means of (6.50); 
with the estimated standard deviation s (b, obtained from the matrix (11.13). Confidence 
intervals for mean responses are obtained by means of (6.59), using s! (b,.) from (11.19) 
in (6.58). These inference procedures are now only approximate, however, because the 
estimation of the variances ог introduces another source of variability. The approximation 
is often quite good when the sample size is not too small. One means of determining 
whether the approximation is good is to use bootstrapping, a statistical procedure that will 
be explained in Section 11.5. 


are 


Example 


TÁBLE 11.1 
Weighted Least 
Squares— 
Blood Pressure 
Example, 


Chapter 11 Building the Regression Model III: Remedial Measures 427 


Use of Ordinary Least Squares with Unequal Error Variances. If one uses b (not 
b,) with unequal error variances, the ordinary least squares estimators of the regression 
coefficients are still unbiased and consistent, but they are no longer minimum variance 
estimators. Also, o?(b] is no longer given by 07(X’X)~'!. The correct variance-covariance 
matrix Is: 


oib] = (X'X) (X e?(e] )(X'X) ' 


If error variances are unequal and unknown, an appropriate estimator of o? {b} can still be 
obtained using ordinary least squares. The White estimator (Ref. 11.2) is: 


S^(b] = XX) (X'SoX) OC X) 5 


where: 
i 
0 е2 0 
So = а 
пхп t > = 
о 0 ... e 


and where e4, ..., е, аге the ordinary least squares estimators of the residuals. White's 
estimator is sometimes referred to as a robust covariance matrix, because it can be used 
to make appropriate inferences about the regression parameters based on ordinary least 
squares, without having to specify the form of the nonconstant error variance. 


A health researcher, interested in studying the relationship between diastolic blood pressure 
and age among healthy adult women 20 to 60 years old, collected data on 54 subjects. A 
portion of the data is presented in Table 11.1, columns 1 and 2. The scatter plot of the data 
in Figure 11.1a strongly suggests a linear relationship between diastolic blood pressure and 
age but also indicates that the error term variance increases with age. The researcher fitted a 
linear regression function by unweighted leastsquares to conductsome preliminary analyses 
of the residuals. The fitted regression function and the estimated standard deviations of bo 


(1) (2) @) (4) (5) (6) 
Diastolic 
Blood. i 
Subject Age Pressure : 
i Xi Y; , ё lel 3; и 
1 27 73 1.18 1.18 3.801 .06921 
2 21 66 —2.34 2.34 2.612 .14656 
3 22 63 —5.92 5.92 2.810 .12662 
52 52 100 13.68 13.68 8.756 .01304 
53: 58 80 `° 980 9.80 9.944 01011 


54 57 109- 19.78 19.78 9.746 .01053 


428 Part Two Multiple Linear Regression 


FIGURE 11.1 Diagnostic Plots Detecting Unequal Error Variances—Blood Pressure Example. 


Blood Pressure 


110 


100 


90 


80 


70 


60 
10 


(a) Scatter Plot (b) Residual Plot against X (c) Absolute Residual 
Plot against x 


20 
S 15 
AA 5 
© ч 
3 Ф 
3 PE 
i Е 
2 5 
і. 1L. 0 os 4 
20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60 
Age Age Age 


and bı are: 


Y = 56.157 + .58003X (11.18) 
(3.994) (.09695) 


The residuals are shown in Table 11.1, column 3, and the absolute residuals are presented in 
column 4. Figure 11.1a presents this estimated regression function. Figure 11.16 presents 
a plot of the residuals against X, which confirms the nonconstant error variance. A plot 
of the absolute residuals against X in Figure 11.1с suggests that a linear relation between 
the error standard deviation and X may be reasonable. The analyst therefore regressed the 
absolute residuals against X and obtained: 


$ = — 1.54946 + .198172X (11.19) 


Here, $ denotes the estimated expected standard deviation. The estimated standard deviation 
function in (11.19) is shown in Figure 11.1c. 

To obtain the weights w;, the analyst obtained the fitted values from the standard deviation 
function in (11.19). For example, for case 1, for which X, — 27, the fitted value is: 


51 = —1.54946 + .198172(27) = 3.801 


The fitted values are shown in Table 11.1, column 5. The weights are then obtained by 
using (11.16a). For case 1, we obtain: 


= — = —__. = (692 
G ^ G8 À 


ш 


The weights ш; are shown in Table 11.1, column 6. 
Using these weights in a regression program that has weighted least squares capability, 
the analyst obtained the following estimated regression function: 


Ê = 55.566 + .59634Х (11.20) 


Chapter 11 Building the Regression Model III: Remedial Measures 429 


Note that the estimated regression coefficients are not much different from those in (11.18) 
obtained with unweighted least squares. Since the regression coefficients changed only a 
little, the analyst concluded that there was no need to reestimate the standard deviation 
function and the weights based on the residuals for the weighted regression in (11.20). 

The analyst next obtained the estimated variance-covariance matrix of the estimated 
regression coefficients by means of (11.13) to find the approximate estimated standard 
deviation s(b,,) = .07924. It is interesting to note that this standard deviation is somewhat 
smaller than the standard deviation of the estimate obtained by ordinary least squares 
in (11.18), .09695. 'The reduction of about 18 percent is the result of the recognition of 
unequal error variances when using weighted least squares. | 

To obtain an approximate 95 percent confidence interval for fj; the analyst employed 
(6.50) and required /(.975; 52) = 2.007. The confidence limits tlien are .59634 + 2.007 
(.07924) and the approximate 95 percent confidence interval is: i 


437 < By < 755° 


We shall consider the appropriateness of this inference approximation in Section 11.5. 


Comments 


1. The condition of the error variance not being constant over all cases is called heteroscedasticity, 
in contrast to the condition of equal error variances, called homoscedasticity. 

2. Heteroscedasticity is inherent when the response in regression analysis follows a distribution in 
which the variance is functionally related to the mean. (Significant nonnormality in Y is encountered 
as well in most such cases.) Consider, in this connection, a regression analysis where X is the speed 
of a machine which puts a plastic coating on cable and Y is the number of blemishes in the coating 
per thousand feet of cable. If Y is Poisson distributed with a mean which increases as X increases, 
the distributions of Y cannot have constant variance at all levels of X since the variance of a Poisson 
variable equals the mean, which is increasing with X. ui 

3. Estimation of the weights by means of an estimated variance or standard deviation function or 
by means of groups of replicates or near replicates can be very helpful when there are major differences 
in the variances of the error terms. When the differences are only small or modest, however, weighted 
least squares with these approximate methods will not be particularly helpful. 

4. The weighted least squares output of some multiple regression software packages includes R?, 
the coefficient of multiple determination. Users of these packages need to treat this measure with 
caution, because R? does not have a clear-cut meaning for weighted least squares. 

5. The weighted least squares estimators of the regression coefficients in (11.9) for the case of 
known error variances c? can be derived readily. The derivation also shows that weighted least squares 
may be viewed as ordinary least squares of transformed variables. The generalized multiple regression 
model in (11.1) may be expressed as follows in matrix form: 


Y-Xf-cs. (11.21) 


where: : 
Е{в} = 0 > 
o?(g] = W^! 

Note that the variance-covariance matrix of the error terms in (11.2) is the inverse of the weight matrix 

defined in (11.7). 


430 PartTwo Multiple Linear Regression 


We now define a diagonal matrix containing the square roots of the weights w; and denote it 
by W'?: 


WI Е A А (11 22) 
0 0 ES. Wy 
Note that W'/ is symmetric and that W! Wt? = W. The latter relation also holds for the corre- 


sponding inverse matrices: W7'?W'? = Wo!” 
We premultiply the terms on both sides of regression model (11.21) by W!7 and obtain: 


үү!?Ү = W'?Xg + уу! i (11.23) 
which can be expressed as: 
Y, = X,B- e, (11.233) 
where: | 
Y, 2 үү!?ү 
X, = W'?x (11.23b) 
є„ = W!?g 


By (5.45) and (5.46), we obtain: 
E{e,,} = W'2E(e] = W'20— 0 (11.242) 


o? {en} = WP 07 fe}w'? = үү!/2үү-!уү!/? 


= WZW- Ry- w'a — (11.24b) 


Thus, regression model (11.23a) involves independent error terms with mean zero and constant 
variance o? = 1. We can therefore apply standard regression procedures to this transformed regression 
model. 

For example. the ordinary least squares estimators of the regression coefficients in (6.25) here 
become: 


b, = (XoXo) ' Х,У, 
Using the definitions in (11.23b), we obtain the result for weighted least squares given in (11.9): 
b, = ГОУ) УУХ (w'?xyw'^y 
= (X W'ZWIPxy"!x w!Pw!Py 
= (XWX)'X'WY 


G. Weighted least squares is a special case of generalized least squares where the error terms not 
only may have different variances but pairs of error terms may also be correlated. 


7. For simple linear regression, the weighted least squares normal equations in (11.8) become: 


Уш = buo уш + bm у wX; (11.25) 
wx = buo Y шх; + bu Уу Wi X; | 


Chapter 11 Building the Regression Model III: Remedial Measures 431 


and the weighted least squares estimators bwo and b, in (11.9) are: 


YwX- vi Xi Y wY; 
2^4 Ur 
bes s (11.26a) 
Sl wiX; 
> wi X? — ( Уш ) 
jY;—b i Xi 
да 2220 m x (11.26b) 


Note that if ali weights areequal so ш; is identically equal toa constant, the normal equations (11.25) 
for weighted least squares reduce to the ones for unweighted least squares in (1.9) and the weighted 
least squares estimators (11.26) reduce to the ones for unweighted least squares in (1.10). ш 


11.2 Multicollinearity Remedial Measures—Ridge Regression 


We consider first some remedial measures for serious multicollinearity that can be imple- 
mented with ordinary least squares, and then take up ridge regression, a method of over- 


-” coming serious multicollinearity problems by modifying the method of least squares. 


Some Remedial Measures 

1. As we saw in Chapter 7, the presence of serious multicollinearity often does not affect 
the usefulness of the fitted model for estimating mean responses or making predictions, 
provided that the values of the predictor variables for which inferences are to be made 
follow the same multicollinearity pattern as the data on which the regression model is based. 
Hence, one remedial measure is to restrict the use of the fitted regression model to inferences 
for values of the predictor variables that follow the same pattern of multicollinearity. 

2. In polynomial regression models, as we noted in Chapter 7, use of centered data 
for the predictor variable(s) serves to reduce the multicollinearity among the first-order, 
second-order, and higher-order terms for any given predictor variable. 

3. One or several predictor variables may be dropped from the model in order to lessen 
the multicollinearity and thereby reduce the standard errors of the estimated regression 
coefficients of the predictor variables remaining in the model. This remedial measure has two 
important limitations. First, no direct information is obtained about the dropped predictor 
variables. Second, the magnitudes of the regression coefficients for the predictor variables 
remaining in the model are affected by the correlated predictor variables not included in the 
model. | 

4. Sometimes it is possible to add some cases that break the pattern of multicollinearity. 
Often, however, this option is not available. In business and economics, for instance, many 
predictor variables cannot be controlled, so that new cases will tend to show the same 
intercorrelation patterns as the earlier ones. > 

5. In some economic studies, it is possible to estimate the regression coefficients for 
different predictor variables from different sets of data and thereby avoid the problems of 
multicollinearity. Demand studies, for instance, may use both cross-section and time series 
data to this end. Suppose the predictor variables in a demand study are price and income, 


432 Part Two Multiple Linear Regression 


and the relation to be estimated 15: 
Y; = Bo + В. Ха + £oXis + ё; (11.27) 


where Y is demand, X, is income, and X; is price. The income coefficient fj may then be 
estimated from cross-section data. The demand variable Y is thereupon adjusted: 


Y; = Y; — bı Xi (11.28) 


Finally, the price coefficient f, is estimated by regressing the adjusted demand variable y: 
on X 2- 

6. Another remedial measure for multicollinearity that can be used with ordinary least 
squares is to form one or several composite indexes based on the highly correlated predictor 
variables, an index being a linear combination of the correlated predictor variables. The 
methodology of principal components provides composite indexes that are uncorrelated, 
Often, a few of these composite indexes capture much of the information contained in 
the predictor variables. These few uncorrelated composite indexes are then used in the 
regression analysis as predictor variables instead of the original highly correlated predictor 
variables. A limitation of principal components regression, also called latent root regression, 
is that it may be difficult to attach concrete meanings to the indexes. 


More information about these remedial approaches as well as about Bayesian regression, 
where prior information about the regression coefficients is incorporated into the estimation 
procedure, may be obtained from specialized works such as Reference 11.3. 


Ridge Regression 


FIGURE 11.2 
Biased 
Estimator with 
Small Variance 
May Be 
Preferable to 
Unbiased 
Estimator with 
Large 
Variance. 


Biased Estimation. Ridge regression is one of several methods that have been proposed to 
remedy multicollinearity problems by modifying the method of least squares to allow biased 
estimators of the regression coefficients. When an estimator has only a small bias and is 
substantially more precise than an unbiased estimator, it may well be the preferred estimator 
since it will have a larger probability of being close to the true parameter value. Figure 11.2 
illustrates this situation. Estimator b is unbiased but imprecise, whereas estimator b" is 
much more precise but has a small bias. The probability that b? falls near the true value f 
is much greater than that for the unbiased estimator b. 


I:— Sampling Distribution of 
Biased Estimator b^ 


Sampling Distribution of 
Unbiased Estimator b 


Elb} E(b^) Statistic 


Al 
B Parameter 
Е Bias of b^ 


Chapter 11 Building the Regression Model П: Remedial Measures 433 


A measure of the combined effect of bias and sampling variation is the mean squared 
error, à concept that we encountered in Chapter 9 in connection with the C, criterion. 
Here, the mean squared error is the expected value of the squared deviation of the biased 
estimator b? from the true parameter В. As before, this expected value is the sum of the 
variance of the estimator and the squared bias: 


E(b^ — BY! = оь) + (E(b^] — В)? (11.29) 


Note that if the estimator is unbiased, the mean squared error is identical to the variance of 
the estimator. E: 


Ridge Estimators. For ordinary least squares, the normal equations are given by (6.24): 
(XXb-XY . (41.30) 


When all variables are transformed by the correlation transformation (7.44), the transformed 
regression model is given by (7.45): 


YP = ВХ + BX + + OM tet (11.31) 
and the least squares normal equations are given by (7.52a): 
rxyxb = Куу (11.32) 


where ry x is the correlation matrix of the X variables defined in (7.47) and ry y is the vector 
of coefficients of simple correlation between Y and each X variable defined in (7.48). —— 

The ridge standardized regression estimators are obtained by introducing into the least 
squares normal equations (11.32) a biasing constant c > 0, in the following form: 


(гух + СЬ = гух - (11.33) 
where b? is the vector of the standardized ridge regression coefficients bf: 
by 
bR 
R 2 
= 11.33a 

(р—1)х1 ( ) 

pri 


and I is the (p — 1) x (p — 1) identity matrix. Solution of the normal equations (11.33) 
yields the ridge standardized regression coefficients: 


ЪЁ = (кух + cl) ‘кух (11.34) 


The constant с reflects the amount of bias in the estimators. When с = 0, (11.34) reduces to 
the ordinary least squares regression coefficients in standardized form, as given in (7.52b). 
When c > 0, the ridge regression coefficients are biased but tend to be more stable (i.e., 
less variable) than ordinary least squares estimators. 


Choice of Biasing Constant c. It can be shown that the bias component of the total 
mean squared error of the ridge regression estimator b^ increases as c gets larger (with all 
bf tending toward zero) while the variance component becomes smaller. It can further be 
shown that there always exists some value c for which the ridge regression estimator b^ has 


434 PartTwo Multiple Linear Regression 


Example 


asmaller total mean squared error than the ordinary least squares estimator b. The difficulty 
is that the optimum value of c varies from one application to another and is unknown, 

A commonly used method of determining the biasing constant с is based on the ridge 
trace and the variance inflation factors (VIF ), in (10.41). The ridge trace is a simultaneous 
plot of the values of the p — 1 estimated ridge standardized regression coefficients for 
different values of c, usually between О and 1. Extensive experience has indicated that the 
estimated regression coefficients ЬЕ may fluctuate widely as с is changed slightly from 0, 
and some may even change signs. Gradually, however, these wide fluctuations cease and the 
magnitudes of the regression coefficients tend to move slowly toward zero as c is increaseg 
further. At ће same time, the values of (VIF ), tend to fall rapidly as c is changed from 0), апа 
gradually the (VIF), values also tend to change only moderately as c is increased further 
One therefore examines the ridge trace and the VIF values and chooses the smallest value 
of c where it is deemed that the regression coefficients first become stable in the ridge trace 
and the VIF values have become sufficiently small. The choice is thus a judgmental one, 


In the body fat example with three predictor variables in Table 7. 1, we noted previously 
several informal indications of severe multicollinearity in the data. Indeed, in the fitted 
model with three predictor variables (Table 7.24), the estimated regression coefficient b, 
is negative even though it was expected that amount of body fat is positively related to 
thigh circumference. Ridge regression calculations were made for the body fat example 
data in Table 7.1 (calculations not shown). The ridge standardized regression coefficients 
for selected values of с are presented in Table 11.2, and the variance inflation factors are 
given in Table 11.3. The coefficients of multiple determination R? are also shown in the 
latter table. Figure 11.3 presents the ridge trace of the estimated standardized regression 
coefficients based on calculations for many more values of c than those shown in Table 11.2. 
To facilitate the analysis, the horizontal c scale in Figure 11.3 is logarithmic. 


TABLE 11.2 Ridge Estimated Standardized 


Regression Coefficients for Different Biasing TABLE 11.3 VIF Values for Regression Coefficients 
Constants c—Body Fat Example with Three and К? for Different Biasing Constants c—Body Fat 
Predictor Variables. Example with Three Predictor Variables. 

[4 рв рв bf с (VIF); (VIF)2 (VIF); R? 
000 4.264 —2.929 —1.561 .000 708.84 564.34 104.61 .8014 
.002 1.441 —.4113 —.4813 .002 50.56 40.45 828  .7901 
.004 1.006 —.0248 —.3149 .004 16.98 13.73 3.36 .7864 
.006 .8300 .1314 —.2472 .006 8.50 6.98 2.19 | 7847 
.008 .7343 .2158 —.2103 .008 5.15 4.30 1.62 .7838 
.010 6742 .2684 —.1870 .010 3.49 2.98 1.38 | 7822 
.020 .5463 .3774 —.1369 .020 1.10 1.08 1.01 .7818 
.030 .5004 4134 —.1181 030 .63 .70 92 .7812 
.040 .4760 .4302 —.1076 .040 .45 56 88 7808 
.050 .4605 4392 —.1005 .050 .37 .49 .85 7804 
100 .4234 .4490 —.0812 .100 25 37 76 .7784 
.500 .3377 .3791 —.0295 .500 15 21 40 — 427 


1.000 


.2798 .3101 —.0059 1.000 .11 ‚14 .23 .6818 


FIGURE 11 3 
Ridge Trace of 
Estimated 
Standardized 
Regression 
Coefficients— 
Body Fat 
Example with 
Three 
predictor 
Variables. 


Chapter 11 Building the Regression Model ПГ: Remedial Measures 435 


001 .01 -10 1.00 с E 


Note the instability in Figure 11.3 of the regression coefficients for very small values 
of c. The estimated regression coefficient b£, in fact, changes signs. Also note'the rapid 
decrease in the VIF values in Table 11.3. It was decided to employ c — .02 here because for 
this value of the biasing constant the ridge regression coefficients have VIF values near 1 
and the estimated regression coefficients appear to have become reasonably stable. The 
resulting fitted model for c — .02 is: 


Ў" = .5463Х* + .3774Х? — .1369Х5 
Transforming back to the original variables by (7.53), we obtain: 
Ê = —7.3978 + .5553X, + .3681X; — .1917X3 


where Y = 20.195, X, = 25.305, X; = 51.170, X4 = 27.620, sy = 5.106, s, = 5.023, 
52 = 5.235, and s3 = 3.647. 

The improper sign on the estimate for fz has now been eliminated, and the estimated 
regression coefficients are more in line with prior expectations. The sum of the squared 
residuals for the transformed variables, which increases with c, has only increased from 
.1986 at c=0 to .2182 at c = .02 while R? decreased from .8014 to .7818. These changes 
are relatively modest. The estimated mean body fat when Хһ = 25.0, Хр = 50.0, and 
Хз = 29.0 is 19.33 for the ridge regression at c = .02 compared to 19.19 utilizing the 
ordinary least squares solution. Thus, the ridge solution at с = .02 appears to be quite 
satisfactory here and a reasonable alternative to the ordinary least squares solution. 


Comments 
1. The normal equations (11.33) for the ridge estimators are as follows: 


(1 + с)р + rab; Mo repa = Ру 


fab? + (+o bk +---4 ra p-b = ry) (11.35) 


Ppt + rpi2b3 +- (E OD, = Frp 


where r;; is the coefficient of simple correlation between the ith and jth X variables and ry; is the 
Coefficient of simple correlation between the response variable Y and the jth X variable. 


36 PartTwo Multiple Linear Regression 


2. VIF values for ridge regression coefficients bf are defined analogously to those for ordin 
least squares regression coefficients. Namely, the VIF value for b? measures how large is the Variance 
of bj relative to what the variance would be if the predictor variables were uncorrelated, Jt can 
be shown that the VIF values for the ridge regression coefficients b/ are the diagonal elements of the 
following (p — 1) x (p — 1) matrix: 


(rxx + с) ^" rxx(rxx + СЇ)! (1 1.36) 

3. Thecoefficient of multiple determination R?, which for ordinary least squares is given in (6.40). 
poy SE 

=l- SSIO (11.37) 


can be defined analogously for ridge regression. A simplification occurs, however, because the total 
sum of squares for the correlation-transformed dependent variable Y* in (7.44a) is: 


The fitted values with ridge regression are: 
n = br X; it us prx ip-t (11.39) 


where the X7, are the X variables transformed according to the correlation transformation (1.440), 
The error sum of squares, as usual, is: 


SSE, = V y? – zy (11.40) 
where P is given in (11.39). R? for ridge regression then becomes: 
Rp = 1 — SSEp (11.41) 


4. Ridge regression estimates can be obtained by the method of penalized least squares. The 
penalized least squares criterion combines the usual sum of squared errors with a penalty for large 
regression coefficients: 


n pd 
O= SIS -EKA + EXC de |У Во 
i ist 
The penalty is a biasing constant, c, times the sum of squares of the regression coefficients. Large 
absolute regression parameters lead to a large penalty; thus, it can be seen that for c > O the “best” 
coefficients generally will be smaller in absolute magnitude than the ordinary least squares estimates. 
For this reason, ridge estimators are sometimes referred to as shrinkage estimators. 

5. Ridge regression estimates tend to be stable in the sense that they are usually little affected by 
small changes in the data on which the fitted regression is based. In contrast, ordinary least squares 
estimates may be highly unstable under these conditions when the predictor variables are highly 
multicollinear, Predictions of new observations made from ridge estimated regression functions tend 
to be more precise than predictions made from ordinary least squares regression functions when 
the predictor variables are correlated and the new observations follow the same multicollinearity 
pattern (see, for instance, Reference 11.4). The prediction precision advantage with ridge regression 
is especially great when the intercorrelations among the predictor variables are high. 

6. Ridge estimated regression functions at times will provide good estimates of mean responses 
or predictions of new observations for levels of the predictor variables outside the region of the obser- 
vations on which the regression function is based. In contrast, estimated regression functions based 
on ordinary feast squares may perform quite poorly in such circumstances. Of course, any estimation 
or prediction well outside the region of the observations should always be made with great caution. 


Chapter 11 Building the Regression Model III: Remedial Measures 437 


7. A major limitation of ridge regression is that ordinary inference procedures are not applicable 
and exact distributional properties are not known. Bootstrapping, a computer-intensive procedure to 
be discussed in Section 11.5, can beemployed to evaluate the precision of ridge regression coefficients. 
Another limitation of ridge regression is that the choice of the biasing constant c is a judgmental one. 
Although a variety of formal methods have been developed for making this choice, these have their 
own limitations. 

8. The ridge regression procedures have been generalized to allow for differing biasing constants 
for the different estimated regression coefficients; see, for instance, Reference 11.3. 

9. Ridge regression can be used to help in reducing the number of potential predictor variables in 
exploratory observational studies by analyzing the ridge trace. Variables whose ridge trace is unstable, 
with the coefficient tending toward the value of zero, are dropped with this approach. Also, variables 
whose ridge trace is stable but at a very small value are dropped. Finally, variables with unstable ridge 
traces that do not tend toward zero are considered as candidates for dropping.” ш 


11.3 Remedial Measures for Influential Cases—Robust Regression 


We noted in Chapter 10 that the hat matrix and studentized deleted residuals are valuable 
tools for identifying cases that are outlying with respect to the X and Y variables. In 
addition, we considered there how to measure the influence of these outlying cases on 
the fitted values and estimated regression coefficients by means of the DFFITS, Cook’s 
distance, and DFBETAS measures. The reason for our concern with outlying cases is that 
the method of least squares is particularly susceptible to these cases, resulting sometimes 
in a seriously distorted fitted model for the remaining cases. A crucial question that arises 
now is how to handle highly influential cases. 

A first step is to examine whether an outlying case is the result of a recording error, 
breakdown of a measurement instrument, or the like. For instance, in a study of the waiting 
time in a telephone reservation system, one waiting time was recorded as 1,000 rings. This 
observation was so extreme and unrealistic that it was clearly erroneous. If erronequs data 
can be corrected, this should be done. Often, however, erroneous data cannot be corrected 
later on and should be discarded. Many times, unfortunately, it is not possible after the 
data have been obtained to tell for certain whether the observations for an outlying case are 
erroneous.'Such cases should usually not be discarded. 

If an outlying influential case is not clearly erroneous, the next step should be to examine 
the adequacy of the model. Scientists frequently have primary interest in the outlying cases 
because they deviate from the currently accepted model. Examination of these outlying 
cases may provide important clues as to how the model needs to be modified. In a study 
of the yield of a process, a first-order model was fitted for the two important factors under 
consideration because previous studies had not found any interaction effects between these 
factors on the yield. One case in the current study was outlying and highly influential, 
with extremely high yield; it corresponded to unusually high levels of the two factors. The 
tentative conclusion drawn was that an interaction effect is present; this was subsequently 
confirmed in a follow-up study. The improved model, resulting from the outlying case, led 
to greatly improved process productivity. 

Outlying cases may also lead to the finding of other types of model inadequacies, such as 
the omission of an important variable or the choice of an incorrect functional form (e.g., a 
quadratic function instead of an exponential function). The analysis of outlying influential 


438 PartTwo Multiple Linear Regression 


cases can frequently lead to valuable insights for strengthening the model such thay the 
outlying case is no longer an outlier but is accounted for by the model. 

Discarding of outlying influential cases that are not clearly erroneous and that cannot be 
accounted for by model improvements should be done only rarely, such as when the model 
is not intended to cover the special circumstances related to the outlying cases. For example 
afew cases in an industrial study were outlying and highly influential. These cases occurred 
early in the study, when the plant was in transition from one process to the new one under 
study. Discarding of these early cases was deemed to be reasonable since the model wag 
intended for use after the new process had stabilized. 

An alternative to discarding outlying cases that is less severe is to dampen the influence 
of these cases. That is the purpose of robust regression. 


Robust Regression 

Robust regression procedures dampen the influence of outlying cases, as compared to 
ordinary least squares estimation, in an effort to provide a better fit for the majority of 
cases. They are useful when a known, smooth regression function is to be fitted to data that 
are “noisy,” with anumber of outlying cases, so that the assumption of a normal distribution 
for the error terms is not appropriate. Robust regression procedures are also useful when 
automated regression analysis is required. For example, acomplex measurement instrument 
used for internal medical examinations must be calibrated for each use. There is no time fora 
thorough identification of outlying cases and an analysis of their influence, nor for a careful 
consideration of remedial measures. Instead, an automated regression calibration must be 
used. Robust regression procedures will automatically guard against undue influence of 
outlying cases in this situation. 

Numerous robust regression procedures have been developed. They are described in 
specialized texts, such as References 11.5 and 11.6. We mention briefly a few of these pro- 
cedures and then describe in more detail one commonly used procedure based on iteratively 
reweighted least squares. 


LAR or LAD Regression. Least absolute residuals (LAR) or least absolute deviations 
(LAD) regression, also called minimum L\-norm regression, is one of the most widely used 
robust regression procedures. It is insensitive to both outlying data values and inadequacies 
of the model employed. The method of least absolute residuals estimates the regression 
coefficients by minimizing the sum of the absolute deviations of the Y observations from 
their means. The criterion to be minimized, denoted by Ly, is: 


Г = Уи — (Bo + В. Ха +- + Bpi Ха ь-1)| (11.42) 


i=} 


Since absolute deviations rather than squared ones are involved here, the LAR method 
places less emphasis on outlying observations than does the method of least squares. 
The estimated LAR regression coefficients can be obtained by linear programming tech- 
niques. Details about computational aspects may be found in specialized texts, such as 
Reference 11.7. The LAR fitted regression model differs from the least squares fitted model 
in that the residuals ordinarily will not sum to zero. Also, the solution for the estimated 
regression coefficients with the method of least absolute residuals may not be unique. 


Chapter 11 Building the Regression Model П: Remedial Measures 439 


IRLS Robust Regression.  Iteratively reweighted least squares IRLS) robust regression 
uses the weighted leastsquares procedures discussed in Section 11.1 to dampen the influence 
of outlying observations. Instead of weights based on the error variances, IRLS robust 
regression uses weights based on how far outlying a case is, as measured by the residual for 
that case. The weights are revised with each iteration until a robust fit has been obtained. 
We shall discuss this procedure in more detail shortly. 


LMS Regression. Least median of squares (LMS) regression replaces the sum of squared 
deviations in ordinary leastsquares by the median of the squared deviations, which is a robust 
estimator of location. The criterion for this procedure is to minimize the median squared 


deviation: | 
median([Y; — (Bo + Bi Ха ++ Br~ Xi,- DI) (11.43) 

with respect to the regression coefficients. Thus, this procedure leads to estimated regression 

coefficients bo, by, ..., by. that minimize the median of the squared residuals. i. 


Other Robust Regression Procedures. There are many other robust regression pro- 
cedures. Some involve trimming one or several of the extreme squared deviations before 
applying the least squares criterion; others are based on ranks. Many of the robust regression 
procedures require extensive computing. 


IRLS Robust Regression 

Iteratively reweighted least squares was encountered in Section 11.1 as a remedial measure 
for unequal error variances in connection with the obtaining of weights from an estimated 
variance or standard deviation function. For robust regression, weighted least squares is 
used to reduce the influence of outlying cases by employing weights that vary inversely 
with the size of the residual. Outlying cases that have large residuals are thereby given 
smaller weights. The weights are revised as each iteration yields new residuals until the 
estimation process stabilizes. A summary of the steps follows: a 


1. Choose a weight function for weighting the cases. 

2. Obtain starting weights for all cases. 

3. Use the starting weights in weighted least squares and obtain the residuals from the fitted 
regression function. 

4. Use the residuals in step 3 to obtain revised weights. 

5. Continue the iterations until convergence is obtained. 


We now discuss each of the steps in IRLS robust regression. 


Weight Function. Many weight functions have been proposed for dampening the influ- 
ence of outlying cases. Two widely used weight functions are the Huber and bisquare weight 
functions: T 


1 juf x 1.345 
Huber: w = $4. 1.345 
|u| 


242 
| [ = (хез) | |u| < 4.685 
Bisquare: w = 4.685 (11.45) 
0 


|u| > 4.685 


(11.44) 


|u| > 1.345 - 


440 Part Two Multiple Linear Regression 


FIGURE 11.4 
Two Weight 
Functions Used 
in IRLS Robust 
Regression. 


(a) Huber Weight Function (b) Bisquare Weight Function 
f Jul = 1.345 А ~ (и/4.685)212 |у = 4 685 
= Ww => Б: 
1.345/]u| jul > 1.345 0 lul > 4.685 
w w 
1.0 "Fx 1.0 
0 (cme plc 0 | 
—1.345 0 1.345 u —4.685 0 4.685 ù 


As before, w denotes the weight, and u denotes the scaled residual to be defined shortly, 
The constant 1.345 in the Huber weight function and the constant 4.685 in the bisquare 
weight function are called tuning constants. They were chosen to make the IRLS robust 
procedure 95 percent efficient for data generated by the normal error regression model (6.7). 
Figure 11.4 shows graphs of the two weight funetions. Note how the weight ш according 
to each weight function declines as the absolute scaled residual gets larger, and that each 
weight function is symmetric around и = 0. Also note that the Huber weight function dogs 
not reduce the weight of a case from 1.0 until the absolute scaled residual exceeds 1,345, 
and that all cases receive some positive weight, no matter how large the absolute scaled 
residual. In contrast, the bisquare weight function reduces the weights of all cases from 
1.0 (unless the residual is zero). In addition, the bisquare weight function gives weight 0 
to all cases whose absolute scaled residual exceeds 4.685, thereby entirely excluding these 
extreme cases. 


Starting Values. Calculations with some of the weight functions are very sensitive to the 
starting values; with others, this is less of a problem. When the Huber weight function is 
employed, the initial residuals may be those obtained from an ordinary least squares fit. 
The bisquare function calculations, on the other hand, are more sensitive to the starting 
values. To obtain good starting values for the bisquare weight function, the Huber weight 
function is often used to obtain an initial robust regression fit, and the residuals for this fit 
are then employed as starting values for several iterations with the bisquare weight function. 
Alternatively, least absolute residuals regression in (11.42) may be used to obtain starting 
residuals when the bisquare weight function is used. 


Scaled Residuals. The weight functions (11.44) and (11.45) are each designed to be used 
with scaled residuals. The semistudentized residuals in (3.5) are scaled residuals and could 
be employed. However, in the presence of outlying observations, ./ MSE is not a resistant 
estimator of the error term standard deviation с; the magnitude of V MSE can be greatly 
influenced by one or several outlying observations. Also, ./MSE is not a robust estimator 
of o when the distribution of the error terms is far from normal. Instead, the resistant and 
robust median absolute deviation (MAD) estimator is often employed: 


MAD = median{|e; — median{e;}|} (1 1.46) 


6745 


The constant .6745 provides an unbiased estimate of o for independent observations from 
a normal distribution. Here, it serves to provide an estimate that is approximately unbiased. 


eet 
Example 1: 


Mathematics 
Proficiency 
with One 


Predictor 
predicto! __ 


Chapter 11 Building the Regression Model Ш: Remedial Measures 441 


The scaled residual u; based on (11.46) then is: 
€; 
-— 11.4 
MAD ( 7) 
Number of Iterations. The iterative process of obtaining a new fit, new residuals and 
thereby new weights, and then refitting with the new weights continues until the process 
converges. Convergence can be measured by observing whether the weights change rela- 
tively little, whether the residuals change relatively little, whether the estimated regression 
coefficients change relatively little, or whether the fitted values change relatively little. 


ui 


The Educational Testing Service Study America's Smallest School: The Family (Ref. 11.8) 
investigated the relation of educational achievement of students,to their home environ- 
ment. Although earlier studies examined the relation of educational achievement to family 
socioeconomic status (e.g., parents’ education, family income, parents’ occupation), this 
study employed more direct measures of the home environment. Specifically, the relation 
of educational achievement of eighth-grade students in mathematics to the following five 
explanatory variables was investigated: 


PARENTS (X,)—percentage of eighth-grade students with both parents living at home 

HOMELIB (X2)—percentage of eighth-grade students with three or more types of 
reading materials at home (books, encyclopedias, magazines, newspapers) 

READING (X3)— percentage of eighth-grade students who read more than 10 pages 


a day 
TVWATCH (X4)—percentage of eighth-grade students who watch ТУ for six hours or 
more per day š 
ABSENCES (X5)—percentage of eighth-grade students absent three days or more last 
month 


Data on average mathematics proficiency (MATHPROF) and the home environment 
variables were obtained from the 1990 National Assessment of Educational Progress for 
37 states, the District of Columbia, Guam, and the Virgin Islands. A portion of the data is 
shown in Table 11.4. 

Our first example of robust regression using iteratively reweighted least squares involves 
only one predictor, HOMELIB (X2). In this way, simple plots can be used to present the 
data and the fitted regression function. 

Figure 11.5a presents a scatter plot of the data, together with a plot of a first-order 
(simple linear) regression model fit by ordinary least squares and a lowess smooth. The 
lowess smooth suggests that the relationship between home reading resources and average 
mathematics proficiency is curvilinear—possibly second order—for the majority of states, 
but three points are clear outliers. The District of Columbia and the Virgin Islands are outliers 
with respect to mathematics proficiency (Y), and Guam appears to be an outlier with respect 
to both mathematics proficiency and available reading resources (X). Figure 11.5b presents 
a plot against X of the residuals obtained from the fitted first-order model in Figure 11.5a. 
This plot shows clearly the three outlying Y cases. Note also from the residual plot that 
there is a group of six states with low reading resources levels, between 68 and 73, whose 
average mathematics proficiency scores are all above the fitted regression line. This is 
another indication that a second-order polynomial model may be appropriate. 


2 
зу 
D 
DS 
‚ 
~% 
М 
з 
a 
А 
ў 
3 
> 
x 
+ 


LP А? game #*% 


жч. 


its Фе эм 


1 
| 
i 
i 
1 
i 
П 
H 
H 
{| 
i 
i 


442 PartTwo Multiple Linear Regression 


FIGURE 11.5 
Comparison 290 
of Lowess, 

Ordinary Least 280 
Squares Fits, 270 
and Robust 
Quadratic 
Fits— 
Mathematics 
Proficiency 
Example. 


260 
> 250 
240 
230 
220 
210 


60 


(a) Lowess and Linear Regression Fits 


e-«-Guam °<-D.C. 


ө < V. Islands 


70 80 
X 


(с) OLS Quadratic Fit 


(e) Robust Quadratic Fit 


90 


Residual 


Chapter 11 Building the Regression Model I: Remedial Measures 443 


TABLE 1 Ц 1“ 4 DataSet—Mathematics Proficiency Example. 


== MATHPROF PARENTS -HOMELIB READING TVWATCH ABSENCES 
Y Xx X2 Хз X4 Xs 
252 75 78 34 18 18 
259 75 73 41. у I 26 
256 77 77 28 j^. 20 23 
256 78 68 42 11. 28 
231 47 76 24 33 37 


Source: ETS Poticy Information Center, America's Smallest School: The Family (Princeton, New Jersey: Educational Testing Service, 1992). 


Second-order model (8.2): 
Y; = Bo + Вохо + Booxin + £i _ (11.48) 


was next fit, again using ordinary least squares. Recall that this model requires calculation 
of the centered predictor хо = Хо — Xj2 and its square, x}. A plot of the fit of the second- 
order model, superimposed on a scatter-plot of the data, is shown in Figure 11-5с. Though 
improved, the fit is again unsatisfactory: the six points that fell above the first-order fit are 
still above the fitted second-order model. The regression line is clearly being influenced by 
the three outliers identified above. The Cook's distance measures for the second-order fit 
are displayed in an index plot in Figure 11.5d. The plot confirms the influence of Guam and 
the Virgin Islands. Í 

Inan effort to dampen the effect of the three outliers, we shall fit second-order model (8.2) 
robustly, using iteratively reweighted least squares and the Huber weight function (11.44). 
We illustrate the calculations for case 1, Alabama. The regression model to be fitted is the 
first-order model. An ordinary least squares fit of this model yields: 


f = 258.436 + 1.8327х› + 0.06491x2 (11.49) 


The residual for Alabama is e, = —2.4109. The residuals are shown in Column 1 of 
Table 11.5. The median of the 40 residuals is median{e; } = 0.7063. Hence, e; — median{e; } = 
— 2.4109 — 0.7063 = —3.1172, and the absolute deviation is lei = median{e;}| = 3.1172. 
The median of the 40 absolute deviations is: 


median{|e; — median{e;}|} = 3.1488 


444 PartTwo Multiple Linear Regression 


TABLE 11.5  Iteratively Huber-Reweighted Least Squares Calculations—Mathematics Proficiency Example 


> ую е 


39 
40 


а) (2) (3) (4) (5) (6) (7) (8) 

Iteration 0 Iteration 1 Iteration 2 Iteration 7 

€; и; Wi е, W; е; Ww; ГА 
—2.4109  —0.51643 1.00000 —3.7542 1.00000 —4.0354 1.00000 —4.1269 
10.5724 2.26466 0.59391 8.4297 0.71515 7.4848 0.86011 6.7698 
3.0454 0.65234 1.00000 1.5411 1.00000 1.1559 1.00000 0.9731 
10.3104 2.20853 0.60900 7.3822 0.81663 5.4138 1.00000 3.6583 
—20.6282  —4.41866 0.30439  —22.2029 0.27042  —22.7964 0.25263  —23.0873 
—14.8358  —3.17791 0.42323  —18.3824 0.32795  —21.4287 0.24019 243167 
.—33.6282  —7.20333 0.18672  -35.2929 0.17081  —35.7964 0.16161  —36.0873 
2.4659 0.52821 1.00000 1.7722 1.00000  * 1.7627 1.00000 1.8699 
—1.7129 —0.36691 1.00000 —2.7325 1.00000 —2.8490 1.00000 —2.8079 
3.2658 0.69954 1.00000 3.2305 1.00000 3.2624 1.00000 3.3014 
1.2658 0.27113 1.00000 1.2305 1.00000 1.2624 1.00000 1.3014 


so that the MAD estimator (11.46) is: 


3.1488 
MAD — == 4.6683 
.6745 
Hence, the scaled residual (11.47) for Alabama is: 
M —2.4109 БО 
ТЬ 


The scaled residuals are shown in Table 11.5, column 2. Since || = .5164 < 1.345, the 
initial Huber weight for Alabama 1s w, — 1.0. The initial weights are shown in Table 11.5, 
column 3. To interpret these weights, remember that ordinary least squares may be viewed 
as a special case of weighted least squares with the weights for all cases being equal to I. 
We note in column 3 that the initial weights for cases 8, 11. and 36 (District of Columbia, 
Guam, and Virgin Islands) are substantially reduced, and that the weights for some other 
states are reduced somewhat. 

The first iteration of weighted least squares uses the initial weights in column 3, leading 
to the fitted regression model: 


Ў = 259.390 + 1.6701» + 0.064633 (11.50) 


This fitted regression function differs considerably from the ordinary least squares fit 
in (11.49). The coefficient of x; has decreased from by = 1.8327 to by = 1.6701, while the 
curvature term Рэз = 0.06463 changed little from its previous value of bs. = 0.06491. This 
has permitted the estimated regression function to increase for smaller values of X» and to 
therefore conform more closely to the six values that previously fell above the fitted line. 
Iteration 2 uses the residuals in column 4 of Table 11.5, scales them, and obtains revised 
Huber weights, which are then used in iteration 2 of weighted least squares. The weights 


Example 2: 


Mathematics 
Proficiency 
with Five 


Predictors 


Chapter 11 Building the Regression Model IIl: Remedial Measures 445 


obtained for the eighth iteration differed relatively little from those for the seventh iteration; 
hence the iteration process was stopped with the seventh iteration. The final weights are 
shown in Table 11.5, column 7. Note that only minor changes in the weights occurred 
between iterations 2 and 7. Use of the weights in column 7 leads to the final fitted model: 


Y = 259.421 + 1.5649x2 + 0.08016x2 (11.51) 


The residuals for the final fit are shown in Table 11.5, column 8. Just as the weights changed 
only moderately between iterations 2 and 7, so the residuals changed only to a small extent 
after iteration 2. Note that the coefficient of the curvature term did change a bit more 
substantially—from 55; = .06463 to Роз = .08016. 

Figure 11.5e shows the scatter plot and the IRLS fitted second-order regression function, 
and Figure 11.5f contains an index plot of the weights used in the final iteration. The robust 
fit now tracks the responses to the 37 states extremely well, and the fit to the six cases that 
were previously above the regression line is now satisfactory. The plot of the final weights 
in Figure 11.5f shows clearly the downweighting of the three outliers. 

We conclude from the robust fit in Figure 11.5e that there is a clear upward-curving 


, relationship between availability of reading resources in the home and average mathematics 


proficiency at the state level. This does not necessarily imply a causal relation, of course. 
The availability of reading resources may be positively correlated with other variables that 
are causally related to mathematics proficiency. 


We shall explore from a descriptive perspective the relationship between average mathemat- 
ics proficiency and the five home environment variables. A MINITAB scatter plot matrix of 
the data is presented in Figure 11.6a and the correlation matrix is presented in Figure 11.6b. 
The scatter plot matrix also shows the lowess nonparametric regression fits, where q — .9 
(the proportion defining a neighborhood) is used in the local fitting. 

We see from the first row of the scatter plot matrix that average mathematics proficiency 
is related to each of the five explanatory variables and that there are three clear outliers. 
They are District of Columbia, Guam, and Virgin Islands, as noted earlier in this section. 
The lowess fits show positive relations for PARENTS, HOMELIB, and READING and a 
negative relation for ABSENCES. The lowess fit for TV WATCH is distorted because of the 
outliers. If these are ignored, the relation is negative. The correlation matrix shows fairly 
strong linear association with average mathematics proficiency for all explanatory variables 
except ABSENCES, where the degree of linear association is moderate. 

The relationships with mathematics proficiency found in Figure 11.6a must be interpreted 
with caution. We see from the remainder of the scatter plot matrix and from the correlation 
matrix in Figure 11.6b that the explanatory, variables are correlated with each other, some 
fairly strongly. Also, some of the explanatory variables are correlated with other important 
variables not considered in this study. For example, the percentage of students with both 
parents at home is related to family income. 

For simplicity, we consider only first-order terms in this example. An initial fit of the 
first-order model to the data using ordinary least squares yields the following estimated 
regression function: ! 


Ӯ = 155.03 + .3911X, + .8639X + .3616X4 — .8467X4 + .1923Х; (11.52) 


446 PartTwo Multiple Linear Regression 


FIGURE 11.6 (a) SYGRAPH Scatter Plot Matrix 
Scatter Plot 
Matrix with 
Lowess 
Smooths, and 
Correlation 
Matrix— 
Mathematics 
Proficiency 
Example. 


HOMELIB |-—— 77 — |,7^ X | BS d 


(b) Correlation Matrix 
MATHPROF PARENTS НОМЕЦВ READING TVWATCH 


PARENTS 0.741 

HOMELIB 0.745 0.395 

READING 0.717 0.693 0.377 

TVWATCH —0.873 —0.831 —0.594 —0.792 

ABSENCES —0.480 —0.565 —0.443 —0.357 0.512 


The signs of the regression coefficients, except for bs, are in the expected directions. The 
coefficient of multiple determination for this fitted model is R? — .86, suggesting that the 
explanatory variables are strongly related to average mathematics proficiency. 

Table 11.6 presents some diagnostics for the fitted model in (11.52): leverage hi, studen- 
tized deleted residual z;, and Cook's distance D;. We see that the District of Columbia, Guam. 
Texas, and Virgin Islands have leverage values equal to or exceeding 2p/n = 12/40 = 30 


TABLE 11.6 
Diagnostics for 
First-Order 
Model with 

All Five 
Explanatory 
Variables— 
Mathematics 
proficiency 
Example. 


Chapter 11 Building the Regression Model III: Remedial Measures 447 


i State hi t; D; 

1 Alabama 16 —.05 .00 

2 Arizona 19 .40 „01. 

3 Arkansas .16 1.41 .06 

4 California .29 ло «00 

8 D.C. .69 141 72 
11 Guam .34 —2.83 .57 
35 Texas .30 2. 25 33 * 
36 Virgin Islands .32 —5.21 1.21  : 
37 Virginia .06 .90 .01 
38 West Virginia 13 =.91 .02 i 
39 Wisconsin .08 39 >» .00 
40 Wyoming .08 —.91 ‚01 


We also see that the Virgin Islands is outlying with respect to its Y value; the absolute 
value of its studentized deleted residual 736 = —5.21 exceeds the Bonferroni critical value 
ata = .05 of t(1 — o /2n; n — p — 1) = 1(.99938; 33) = 3.53. Of these outlying cases, 
the Virgin Islands is clearly influential according to Cook's distance measure, and Dis- 
trict of Columbia and Guam are somewhat influential; the 50th percentile of the F(6, 34) 
distribution is .91, and the 25th percentile is .57. 

Residual plots against each of the explanatory variables and against Ӯ (not shown here) 
presented no strong indication of nonconstancy of the error variance for the states aside from 
the outliers. Since the explanatory variables are correlated among themselves, the question 
arises whether a simpler model can be obtained with almost as much descriptive ability as 
the model containing all five explanatory variables. Figure 11.7 presents the MINITAB best 
subsets regression output, showing the two models with highest R? for each number of X 
variables. We see that the two best models for three variables (p — 4 parameters) contain 
relatively little bias according to the C, criterion and have R? values almost as high as the 
model with all five variables. 

We explore now one of these two models, the one containing HOMELIB, READING, 
and TVWATCH. In view of the outlying and influential cases, we employ IRLS robust 
regression with the Huber weight function (11.44). We find that after eight iterations, the 
weights change very little, so the iteration process is ended with the eighth iteration. The 
final robust fitted regression function is: ` 


Y = 207.83 + .7942X; + .1637X3 — 1.1695 X, (11.53) 


The signs of the regression coefficients agree with expectations. For comparison, the re- 
gression function fitted by ordinary least squares is: 


Y = 199.61 + .7804X; + .4012Х3 — 1.1565X4 (11.54) 


448 PartTwo Multiple Linear Regression 


FIGURE 11.7 Best Subsets Regression of MATHPROF 


MINITAB Best A 
Subsets PHRTB 
Regression— AQDEVS 
Mathematics RMAWE 
Proficiency EEDAN 
Example. NLITC 
Adj. TINCE 
Vers R-sq R-sq C-p S SBGHS 
1 76.3 75.7 22.0 6.5079 X 
1 55.5 54.3 72.8 8.9157 X 
2 84.2 83.4 4.6 5.3810 X X 
2 79.2 78.1 16.8 6.1743 XX 
3 85.1 83.9 4.4 5.2939 XXX 
3 85.1 83.8 4.5 5.3062 XX X 
4 85.9 84.3 4.5 5.2327 XXXX 
4 85.4 83.7 5.8 5.3285 * XX XX 
5 86.1 84.1 6.0 5.2680 XXXXX 


Notice that the robust regression led to a deemphasis of Хз (READING). with the other 
regression coefficients remaining almost the same. 

То obtain an indication of how well the robust regression model (11.53) describes the 
relation between average mathematics proficiency of eighth-grade students and the three 
home environment variables, we have ranked the 40 states according to their average math- 
ematics proficiency score and according to their corresponding fitted value. The Spearman 
rank correlation coefficient (2.97), is .945. This indicates a fairly good ability of the three 
explanatory variables to distinguish between states whose average mathematics proficiency 
is very high or very low. 

The analysis of the mathematics proficiency data set in Table 11.4 presented here is by 
no means exhaustive. We have not analyzed higher-order effects, nor have we explored 
other subsets that might be reasonable to use. We have not recognized that the precision of 
the state data varies because the data are based on samples of different sizes, nor have we 
considered other explanatory variables that are related to mathematics proficiency, such as 
parents’ education and family income. Furthermore, we have analyzed state averages, which 
may obscure important insights into relations between the variables at the family level. 


Comments 


|. Robust regression requires knowledge of the regression function. When the appropriate re- 
gression function is not clear. nonparametric regression may be useful. Nonparametric regression is 
discussed in Section 11.4. 

2. Robust regression cun be employed to identify outliers in situations where there are multiple 
outliers whose presence is masked with diagnostic measures that delete one case at a time. Cases 
whose final weights are relatively small are outlying. 

3. As illustrated by the mathematics proficiency example. robust regression is often useful for 
confirming the reasonableness of ordinary leust squares results. When robust regression yields similar 
results to ordinary least squares (for example. the residuals are similar). one obtains some reassurance 
that ordinary least squares is not unduly influenced by outlying cases. 


Chapter 11 Building the Regression Model III: Remedial Measures 449 


4. Alimitation of robust regression is that the evaluation of the precision of the estimated regression 
coefficients is more complex than for ordinary least squares. Some large-sample results have been 
obtained (see, for example, Reference 11.5), but they may not perform well in the presence of outliers. 
Bootstrapping (to be discussed in Section 11.5) may also be used for evaluating the precision of robust 
regression results. 

5. When the Huber, bisquare, and other weight functions are based on the scaled residuals 
in (11.47), they primarily reduce the influence of cases that are outlying with respect to their Y 
values, To make the robust regression fit more sensitive to cases that are outlying with respect to their 
X values, studentized residuals in (10.20) or studentized deleted residuals in (10.24) may be used 
instead of the scaled residuals in (11.47). Again, V MSE may be replaced by MAD in (11.46) for better 
resistance and robustness when calculating the studentized or studentized deleted residuals. 

In addition, the weights w; obtained from the weight function may be modified to reduce directly 
the influence of cases with large X leverage. One suggestion is to multiply t the weight function weight 
w; by /1 — hi, where hy is the leverage value of the ith case defined in (10. 18). 

Methods that reduce the influence of cases that are outlying with respect to tbeir X values are 
called bounded influence regression methods. ш 


E 


11.4 Nonparametric Regression: Lowess Method 
and Regression ‘Trees 


We considered nonparametric regression in Chapter 3 when there is one predictor variable 
in the regression model. We noted there that nonparametric regression fits are useful for 
exploring the nature of the response function, to confirm the nature of a particular response 
function that has been fitted to the data, and to obtain estimates of mean responses without 
specifying the nature of the response function. 

Nonparametric regression can be extended to multiple regression when there are two or 
more predictor variables. Additional complexities are encountered, however, when making 
this extension. With more than two predictor variables, it is not possible to show the fitted 
response surface graphically, so one cannot see its appearance. Unlike parametric regression, 
no analytic expression for the response surface is provided by nonparametric regression. 
Also, as the number of predictor variables increases, there may be fewer and fewer cases in 
a neighborhood, leading to erratic smoothing. This latter problem is less serious when the 
predictor variables are highly correlated and interest in the response surface is confined to 
the region of the X observations. 

Numerous procedures have been developed for fitting a response surface when there 
are two or more predictor variables without specifying the nature of the response function. 
Reference 11.9 discusses a number of these procedures. These include locally weighted 
regressions (Ref. 11.10), regression trees (Ref. 11.11), projection pursuit (Ref. 11.12), and 
smoothing splines (Ref. 11.13). We discüss the lowess method and regression trees in this 
section. We first extend the lowess method to multiple regression. In doing so, we will be 
able to describe it in far greater detail because we have established the necessary foundation 
of weighted least squares in Section 11.1. 


»* 
+ 


Lowess Method ү 


We described the lowess method briefly in Chapter 3 for regression with one predictor 
variable. The lowess method for multiple regression, developed by Cleveland and Devlin 


450 PartTwo Multiple Liuear Regression 


Example . 


(Ref. 11.10), assumes that the predictor variables have already been selected, that the re. 
sponse function is smooth, and that appropriate transformations have been made or Other 
remedial steps taken so that the error terms are approximately normally distributed with 
constant variance. For any combination of X levels, the lowess method fits either a fis. 
order model or a second-order model based on cases in the neighborhood, with more 
distant cases in the neighborhood receiving smaller weights, We shall explain the 1оууев 
method for the case of two predictor variables when we wish to obtain the fitted value 
at (Xii, Xn). 


Distance Measure. We need a distance measure showing how far each case is from 
(Xni, X42). Usually, a Euclidean distance measure is employed. For the ith case, this 
measure is denoted by d; and is defined: 


d, = [(Ха — Xii + (Хо Хо) (11.55) 


When the predictor variables are measured on different scales, each should be scaled by 
dividing it by its standard deviation. The median absolute deviation estimator in (11.46) 
can be used in place of the standard deviation if outliers are present. 


Weight Function. The neighborhood about the point (X41, Хә) is defined in terms of 
the proportion q of cases that are nearest to the point. Let d, denote the Euclidean distance 
of the furthest case in the neighborhood. The weight function used in the lowess method is 
the tricube weight function, which is defined as follows: 


[1 — (d;i/d, P d; «d, 
m 11. 
w б id. (11.56) 


Thus, cases outside the neighborhood receive weight zero and cases within the neighborhood 
receive weights between 0 and 1, the weight decreasing with greater distance. In this way, 
the mean response at (Xni, Хә) is estimated locally. 

The choice of the proportion g defining the neighborhood requires a balancing of two 
opposing tendencies. The larger is g, the smoother will be the fit but at the same time the 
greater may be the bias in the fitted value. A choice of g between .4 and .6 may often be 
appropriate. 


Local Fitting. Given the weights for the п cases based on (11.55) and (11.56), weighted 
least squares is then used to fit either the first-order model (6.1) or the second-order 
model (6.16). The second-order model is helpful when the response surface has substantial 
curvature; moderate curvilinearities can be detected by using the first-order model. After 
the regression model is fitted by weighted least squares, the fitted value Л at (X41, Ха) then 
serves as the nonparametric estimate of the mean response at these X levels. By recalculat- 
ing the weights for different (Ху, X42) levels, fitting the response function repeatedly, and 
each time obtaining the fitted value T we obtain information about the response surface 
without making any assumptions about the nature of the response function. 


We shall fit a nonparametric regression function for the life insurance example in Chapter 10. 
A portion of the data for a second group of 18 managers is given in Table 11.7, columns 1- 
3. The relation between amount of life insurance carried (Y) and income (Х|) and risk 
aversion (X2) is to be investigated, the data pertaining to managers in the 30—39 age group. 


TABLE 11.7 
Lowess 
Calculations 
for Моп- 
parametric 
Regression Fit 
at Xn = 30, 
Хь 7 3— Life 
Insurance 
Example. 


Chapter 11 Building the Regression Model HI: Remedial Measures 451 


(1) (2) (3) O. 6) 
i Xn Xi Y : d Wi 
1 66.290 7 240 3.013 0 
2 40.964 5 73 1.143 ..300. 
3 72.996 10 311 4.212 0 
16 79.380 1 316 3.461 0 
17 52.766 8 154 2.663 0 
18 55.916 6 164 2.188 0 


“he 


The local fitting will be done using the first-order model in (6.1) because the number of 
available cases is not too large. For the same reason, the proportion of cases defiping the 
local neighborhoods is set at g = .5; in other words, each local neighborhood is to consist 
of half of the cases. i 

The exploration of the response surface begins at Хр = 30, X12 —3. To obtain a locally 
fitted value at X,, = 30, Хро —3, we need to obtain the Euclidean distances of each case 
from this point. We shall use the sample standard deviations of the two predictor variables to 
standardize the variables in obtaining the Euclidean distance since the two variables are mea- 
sured on different scales. The sample standard deviations are s, — 14.739 and s; — 2.3044. 
For case 1, the Euclidean distance from Хр = 30, Хә —3 is obtained as follows: 


66.290 —30\? (7-3 \7]'? 
de ( 14.739 ) Н (эзоп) | MOSES x 
The Euclidean distances are shown in Table 11.7, column 4. The Euclidean distance of the 
furthest case in the neighborhood of X,, = 30, Хро —3 for q = .5 is for the ninth case when 
these are ordered according to their Euclidean distance. It is 2, = 1.653. Since d, = 3.013 > 
1.653, the weight assigned for case 1 is w, — 0. For case 2, the Euclidean distance is 
d» = 1.143. Since this is less than 1.653, the weight for case 2 is: 


ш» = [1 — (1.143/1.653P ? = .300 


The weights are shown in Table 11.7, column 5. 
The fitted first-order regression function using these weights is: 


Y = —134.076 + 3.571X, + 10.532X; 
The fitted value for Xp, = 30, Хро = 3 therefore is: 
Ў, = —134.076 + 3.571(30) 4- 10.532(3) = 4.65 


In the same fashion, locally fitted values at other-values of Xp, and Ху are calculated. 
Figure 11.8a contains a contour plot of the fitted response surface. The surface clearly 
ascends as X, increases, but the effect df X; is more difficult to see from the contour plot. 
The effect of X; can be seen more easily by the conditional effects plots of Y against 
X, at low, middle, and high levels of X; in Figure 11.8b. The conditional effects plots in 
Figure 11.8b are also called two-variable conditioning plots. Note that the expected amount 
of life insurance carried increases with income (X,) at all levels of risk aversion (X2). The 


452 PartTwo Multiple Linear Regression 


FIGURE 11.8 Contour and Conditioning Plots for Lowess Nonparametric Regression—Life Insurance 
Example. 


(a) Contour Plot 


N 


w A К fon 


30 39 48 57 66 75 
X 


-g 
(b) Two-Variable Conditioning Plots 


E 


350 350 
300 300 
250 250 
200 200 

>= >= 
150 150 
100 x2 = 3 100 

50 50 
0 
30 39 48 57 66 75 30 
X 


response functions for X? = 3 and X; = 6 appear to be approximately linear. The dip in the 
left part of the response function for X2 = 9 may be the result of an interaction or of noisy 
data and inadequate smoothing. Note also from Figure 11.8b that the expected amount of 
: life insurance carried at the higher income levels increases as the risk aversion becomes 


very high. 


Comments 


1. The fitted nonparametric response surface can be used, just as for simple regression. for exam- 
ining the appropriateness of a fitted parametric regression model. If the fitted nonparametric response 
surface falls within the confidence band in (6.60) for the parametric regression function. the nonpara 
metric fit supports the appropriateness of the parametric regression function. 

2. Reference 11.10 discusses a procedure to assist in choosing the proportion q for defining à 
local neighborhood. It also describes how the precision of any fitted value Ў, obtained with lowess 
nonparametric multiple regression can be approximated. 


Chapter 11 Building the Regression Model III: Remedial Measures 453 


3. The assumptions of normality and constant variance of the error terms required by the lowess 
nonparametric procedure can be checked in the usual fashion. The residuals are obtained by fitting 
the lowess nonparametric regression function for each case and calculating e; = Y; — Ў; as usual. 
These residuals will not have the least squares property of summing to zero, but can be examined for 
normality and constancy of variance. The residuals can also serve to identify outliers that might not 
be disclosed by standard diagnostic procedures. 

4. A discussion of some of the advantages of the lowess smoothing procedure is presented in 
Reference 11.14. ш 


Regression Trees 


TABLE 11.8 
Data Set aud 
5-Region 
Regression 
Tree Fit — 
Steroid Level 
Example. 


Regression trees are a very powerful, yet conceptually simple, method of nonparametric 
regression. For the case of a single predictor, the range of the predictor is partitioned into 
segments and within each segment the estimated regression fit is given by the mean of 
the responses in the segment. For two or more predictors, the X space is partitioned into 
rectangular regions, and again, the estimated regression surface is given by the mean of 
the responses in each rectangle. Regression trees have become a popular alternative to 
multiple regression for exploratory studies, especially for extremely large data sets. Along 
with neural networks (see Chapter 13), regression trees are one of the standard methods 
used in the emerging field of data mining. Regression trees are easy to calculate, require 
virtually no assumptions, and are simple to interpret: 


One Predictor Tree: Steroid Level Example. Figure 1.3 on page 5 presents data on age 
and level of a steroid in plasma for 27 healthy females between 8 and 25 years of age. 
The data are shown in the first two columns of Table 11.8. A regression tree based on five 
regions is obtained by partitioning the range of X (age) into five segments or regions, and 
using the sample average of the Y responses in each region for the fitted regression surface. 
We will use Rs, through Rss to denote the regions of a 5-region tree, and Ys, through Yp,, 
to denote the corresponding sample averages. These values are shown for the steroid level 
example in columns 4—6 of Table 11.8. The fitted regression tree is shown in Figure 11.9a. 
Note that the regression tree is a step function that steps up rapidly for girls between the 
ages of 8 and 14, after which point steroid level is roughly constant. 

A plot of residuals versus fitted values is shown in Figure 11.9b. Note that the variance 
of the residuals in each region seems roughly constant, an indication that further splitting 
may be unnecessary. We discuss the determination of appropriate tree size below. 


(1) (2) (3) (4) 6 - (6) 
Steroid Region Fitted 
Case Level Age Number Region Value 
i Y; X; = k Rsk Ya. 
1 ‚27.1 23 1 .8zX«9 3.550 
224 19 2 9 « X «10 8.133 
3 21.9 25 ^3 10 < X «13 13.675 
I oe. А2 A 13« X «14 16.950 
25 12.8 13 5 14< X < 25 22.200 
26 20:8 14 


27. 20.6 18 


454 PartTwo Multiple Linear Regression 


FIGURE 11.9 
Fitted 
Regression 
Tree, Residual 
Plot, and 
Regression 
Tree 
Diagram— 
Steroid Level 
Example. 


5 e 
е 
e . 
— е 
Ф = e " 
& 5 . ° H 
5 5 0 e 
© o 
9 a a 
a ° t 
в Ы 
e LÀ 
5 е L Ar jn 
2 12 22 
Age Predicted 
(a) (b) 


Node 1: 
Is Age < 13? 


Yes 


Node 2: 
Is Age « 10? 


Node 3: 
Is Age < 14? 
Leaf 5: 
14 = Age « 25 


Leaf 2: 
9 x Age « 10 


(с) 


Determining the predicted value for a given X; is accomplished with the help of a 
tree diagram, such as the one shown in Figure 11.9c. Suppose we wish to determine the 
predicted value at X, = 12.5. Starting at node 1—the root node—we ask, “Is Age < 13?" 
Since 12.5 « 13, we follow the left branch to node 2 where we ask, “Is Age < 10?" Since 
Age is not less than 10, we branch right to the terminal node labeled Leaf 3, where we find 
from Table 11.8 that Y, = 13.675. Tree diagrams such as that shown in Figure 11.9c are 
particularly helpful when more than a single predictor is present. 


Growing a Regression Tree. То find a "best" regression tree, it is necessary to specify the 
number of regions, r, and the boundaries, or split points, between the regions. The process 
of determining a best value for r and the associated split points is referred to as growing 


. the tree. 


First consider the case of a single predictor, and assume that the range of X is to be 
divided into r = 2 regions, Ко and Рә. We need to find the split point X, that optimally 
divides the data into two sets. The best point is chosen to minimize the error sum of squares 


Chapter 11 Building the Regression Model IN: Remedial Measures 455 


FIGURE 11.10 30 
Growing the 
pession T T 
е "Sed 8 20 © 
Level Example. $ ® 
Bv E 
0 
30 
© © 
E & 20 
д, К 
© © 
E $ 10 


for the resulting regression tree: 
SSE = SSE(R2)) + SSE(R22) 


where SSE(R,;) is the sum of squared residuals in region R,;: 


SSE(R,j) = У (0 — Ya, 


For the steroid level data, the best split point is shown in Figure 11.10a to be X, = 13.0. 
For this tree, we have: 


Roy = {ХІХ < 13) 
Roy = (X|X > 13} 
for which we obtain: - | 
SSE = SSE(R;,) + SSE(R;;) = 238.55 + 167.79 = 406.35 
From (2.72), the coefficient of determination for the regression tree ts: 
к? SSE 406.35 _ 


EN а 


SSTO "12848 ^ 
Also, MSE = SSE/(n — r) = 406.35/(27 — 2) = 16.254. 


684 


456 PartTwo Multiple Linear Regression 


FIGURE 11.11 
Regression 
Tree Growth— 
Two-Predictor 
Example. 


At this point, there are two regions, and growing the tree further will require the identig. 
cation of a third region. We have two choices: (1) we can work sequentially and split One of 
the two existing regions, or (2) start from scratch and identify simultaneously two entirel 
new split points that globally minimize the resulting SSE criterion. The second арро 
will always lead to a criterion value that is at least as good as the first; however, as the tree 
grows, so do the computational demands associated this approach (particularly if there jg 
more than one predictor). For this reason, regression trees are generally grown sequentially 
according to the following rule: If the tree currently is based on r regions, we determine the 
best split point for each of the regions, and then split the region that leads to the greatest 
decrease in SSE. 

For the steroid-level example, the next step involves splitting Ro, at X, = 10, resulting 
in three regions: 

Ry = {XIX < 10} 
Ry = {X|10 x X < 13} 


R33 = {ХІХ >*13} 


A plot of this tree is shown in Figure 11.106. Continuing this process, we next split Ry at 
X; = 14, and a final split occurs at X, = 19. The 4-region and 5-region regression trees аге 
shown in Figures 11.10с and 11.104. 

For two or more predictors, the procedure is the same, except that in addition to deter. 
mining the best region and split point, we must also determine the best predictor upon which 
to base the split. The rule is as follows: assuming the tree is based currently on ғ rectangular 
regions, we determine the best split point for each of the r regions for each of the p – 1 
predictors, and then implement a new split based on the region and predictor that leads to 
the largest decrease in SSE. Note that we are choosing the best predictor-and-split-point 
combination from r (p — 1) possibilities. 

This process is illustrated for two predictors in Figure 11.11. We first consider splitting 
the rectangular X space either on the basis of Ху or Хә. We find the best split points Xi, 
and X», for X, and X» respectively, and then we base our next partition on the split point 
that leads to the greatest decrease in SSE. According to Figure 11.1 1a, the first split is based 
on X;, resulting in two rectangular regions R, and R2». For each of these two regions, we 
determine the best predictor upon which to split and the associated split point, and choose 
the combination that leads to the largest decrease in SSE. Figure 11.116 indicates that region 
Ry was partitioned in this step on the basis of X». Finally, in the third split, region Ёз 1s 
partitioned on the basis of Х|, resulting in a 4-region пее, as shown in Figure 11.11с. 


(а) (b) (c) 

Branch 1—to 2 Regions Branch 2—to 3 Regions Branch 3—to 4 Regions 
Best split based on X, Best split based on X5 іп R2; Best split based on X; in R4 
Х| fn | Вә Хә] Ra Хә | Ra [Raa 

R33 Rag 
X X X, 


Chapter 11 Building the Regression Model Ш: Remedial Measures 457 


Determining the Number of Regions, r. If the tree-growing process is allowed to con- 
tinue indefinitely, there will eventually be n regions, with each region containing a single 
observation, and further partitioning will be impossible. A "best" number of regions will 
generally fall between 1 and я, and is usually chosen through validation studies. For exam- 
ple, for each split we determine, in addition to SSE, the mean square for prediction error 
MSPR for data in a hold-out or validation sample. We then choose the tree that minimizes 
MSPR. 


Ea i We illustrate the use of regression trees with the University admissions data set in Ap- 
Examp'e — pendix C.4. We fit GPA at the end of freshman year (Y) as a function of ACT entrance test 
score (X4) and high school rank (X5). The data consist of 705 cages, and a random sample 
of n* = 353 records was selected for the validation set. Figure 11.12a provides a plot of 
MSPR versus the number of regions, or terminal nodes. The plot shows that the ability to 
predict improves as nodes are added until r = 5, for which MSPR = .318 (MSE for this 


FIGURE 11.12 S-Plus Regression Tree Results—University Admissions Example. 


23 4 5 6 7 8 9 10 © 40 
Regions i 15 act 


(a) (b) 


HSRank < 81.5 


No 
1 
АСТ < 23.5 
Yes NO 0 


| 2.950 | [ esso < 96.5 | ү? 


Мо 


Residual 
moo соора) mocmm oo 


о оо O«mmoOnmD ооо 


о oo оопахтораышнышно: 
O оороо акин 


8 
8 
8 


2.5 3.0 3.5 
Predicted СРА 


(9 (d) 


458 PartTwo Multiple Linear Regression 


model is .322). For r > 5, the ability to predict responses in the validation set deteriorates 
as the number of regions increases. A plot of MSE is also included, and as expected, Msp 
decreases monotonically with the size of the tree. The fitted regression tree surface is Shown 
in Figure 11.12b and the corresponding tree diagram is shown in Figure 11.12c. 

A plot of residuals versus predicted values is shown for this tree in Figure 11.124. Note 
that the variance of the residuals appears to be somewhat constant, and indication that 
further partitions may not be required. 

It is instructive to compare qualitatively the fit of the regression tree to the fit Obtained 
using standard regression methods. Using a full second-order model leads to the equation: 


Ӯ = 1.77 — .0223X, + .0780Х› + .000187 XT — .00133Х2 + .000342X, x, 


MSPR for the second-order regression model is .296, which is slightly better than the value 
obtained by the regression tree (.318). Interestingly the MSE value obtained by the second- 
order regression model (.333) is about the same as that obtained by the regression tree 
(.322). " 

In summary, the regression tree surface suggests as expected that college GPA increases 
with both ACT score and high school rank. Overall, high school rank seems to have a 
slightly more pronounced effect than ACT score. For this tree, R? is .256 for the training 
data set, and .157 for the validation data set. We conclude that GPA following freshman 
year is related to high school rank and ACT score, but the fraction of variation in GPA 
explained by these predictors is quite small. 


Comments 


1. The number of regions r is sometimes chosen by minimizing the cost complexity criterion: 


С,(ғ) = у; SSE(RA) + Ar 


k=1 


The cost complexity criterion has wo components: the sum of squared residuals plus a penalty, Ar, 
for the number of regions r employed. The tuning parameter 5. > O determines the balance between 
the size of the tree (complexity) and the goodness of fit. Larger values of A lead to smaller trees. 
Note that this criterion is a form of penalized least squares. which, as we commented in Section 11.2, 
can be used to obtain ridge regression estimates. Penalized least squares is also used in connection 
with neural networks as described in Section 13.6. A “besi”? value for А is generally chosen through 
validation studies. 

2. Regression trees are often used when the response Y is qualitative. In such cases, predicting 
a response at X, is equivalent to determining to which response category X; belongs. This isa 
classification problem, and the resulting tree is referred to as a classification tree. Details are provided 
in References 11.11 and 11.15. Li 


11.5 Remedial Measures for Evaluating Precision 
in Noustandard Situations—Dootstrapping 


For standard fitted regression models, methods described in earlier chapters are available for 
evaluating the precision of estimated regression coefficients, fitted values, and predictions of 
new observations. However, in many nonstandard situations, such as when nonconstant error 


Chapter 11 Building the Regression Model III: Remedial Measures 459 


variances are estimated by iteratively reweighted least squares or when robust regression 
estimation is used, standard methods for evaluating the precision may not be available or 
may only be approximately applicable when the sample size is large. Bootstrapping was 
developed by Efron (Ref. 11.16) to provide estimates of the precision of sample estimates 
for these complex cases. A number of bootstrap methods have now been developed. The 
bootstrap method that we shall explain 1s simple in principle and nonparametric in nature. 
Like all bootstrap methods, it requires extensive computer calculations. 


General Procedure 

We shall explain the bootstrap method in terms of evaluating the precision of an estimated 
regression coefficient. The explanation applies identically to any other estimate, such as a 
fitted value. Suppose that we have fitted a regression model (simple of multiple) by some 
procedure and obtained the estimated regression coefficient Р|; we now. wish to evaluate the 
precision of this estimate by the bootstrap method. In essence, the bootstrap method cajls for 
the selection from the observed sample data of a random sample of size n with replacement. 
Sampling with replacement implies that the bootstrap sample may contain some duplicate 
data from the original sample and omit some other data in the original sample. Next, the 
bootstrap method calculates the estimated regression coefficient from the bootstrap sample, 
using the same fitting procedure as employed for the original fitting. This leads to the first 
bootstrap estimate Бү. This process is repeated a large number of times; each time a bootstrap 
sample of size и is selected with replacement from the original sample and the estimated 
regression coefficient is obtained for the bootstrap sample. The estimated standard deviation 
of all of the bootstrap estimates br, denoted by s*{by}, is an estimate of the variability of 
the sampling distribution of b, and therefore is a measure of the precision of b,. 


Bootstrap Sampling 
Bootstrap sampling for regression can be done in two basic ways. When the regression 
function being fitted is a good model for the data, the error terms have constant variance, 
and the predictor variable(s) can be regarded as fixed, fixed X sampling is appropriate. Here 
the residuals e; from the original fitting are regarded as the sample data to be sampled with 
replacement. After a bootstrap sample of the residuals of size и has been obtained, denoted 
by ет, ..., ё*, the bootstrap sample residuals are added to the fitted values from the original 
fitting to obtain new bootstrap Y values, denoted by Y*, ..., Y: 


Y! =Й +ë (11.57) 


These bootstrap Y* values are then regressed on the original X variable(s) by the same 
procedure used initially to obtain the bootstrap estimate Бү. 

When there is some doubt about the adequacy of the regression function being fitted, the 
error variances are not constant, and/or the predictor variables cannot be regarded as fixed, 
random X sampling is appropríate. For simple regression, the pairs of X and Y data in the 
original sample are considered to be Ше data to be sampled with replacement. Thus, this 
second procedure samples cases with replacement z times, yielding a bootstrap sample of 
n pairs of (X*, Y*) values. This bootstrap sample is then used for obtaining the bootstrap 
estimate by, as with fixed X sampling. 

The number of bootstrap samples to be selected for evaluating the precision of an 
estimate depends on the special circumstances of each application. Sometimes, as few 


460 PartTwo Multiple Linear Regressiou 


as 50 bootstrap samples are sufficient. Often, 200—500 bootstrap samples are adequate One 
can observe the variability of the bootstrap estimates by calculating s" (P1) as the number 
of bootstrap samples is increased. When s*{b7} stabilizes fairly reasonably, bootstrapping 
can be terminated. 


Bootstrap Confidence Intervals 


Examples 


Example 1— 
Toluca 
Company 


Bootstrapping can also be used to arrive at approximate confidence intervals. Much research 
is ongoing on different procedures for obtaining bootstrap confidence intervals (see, for 
example, References 11.17 and 11.18). A relatively simple procedure for setting up a 1 — ü 
confidence interval is the reflection method. This procedure often produces a reasonable 
approximation, but not always. The reflection method confidence interval for p, is based 
on the (œ/2)100 and (1 — @/2)100 percentiles of the bootstrap distribution of b*, These 
percentiles are denoted by b}(@/2) and bj (1 — œ/2), respectively. The distances of these 
percentiles from b,, the estimate of | from the original sample, are denoted by d, and d: 


« 


dy = b, — bi (a/2) (11.582) 

d; —bi(1—o/2)—b (11.58b) 
The approximate 1 — o confidence interval for Ву then is: 

b-d xfizb +4, (11.59) 


Bootstrap confidence intervals by the reflection method require a larger number of boot- 
strap samples than do bootstrap estimates of precision because tail percentiles are required, 
About 500 bootstrap samples may be a reasonable minimum number for reflection bootstrap 
confidence intervals. 


We illustrate the bootstrap method by two examples. In the first one, standard analytical 
methods are available and bootstrapping is used simply to show that it produces similar 
results. In the second example, the estimation procedure is complex, and bootstrapping 
provides a means for assessing the precision of the estimate. 


We use the Toluca Company example of Table 1.1 to illustrate how the bootstrap method 
approximates standard analytical results. We found in Chapter 2 that the estimate of the 
slope В, is b, = 3.5702, that the estimated precision of this estimate is s(bi) = .3470, and 
that the 95 percent confidence interval for fj is 2.85 x В, < 4.29. 

То evaluate the precision of the estimate b, == 3.5702 by the bootstrap method, we shall 
use fixed X sampling. Here, the simple linear regression function fits the data well, the error 
variance appears to be constant, and it is reasonable to consider a repetition of the study 
with the same lot sizes. A portion of the data on lot size (X) and work hours (Y) is repeated 
in Table 11.9, columns 1 and 2. The fitted values and residuals obtained from the original 
sample are repeated from Table 1.2 in columns 3 and 4. Column 5 of Table 11.9 shows the 
first bootstrap sample of n residuals e7, selected from column 4 with replacement. Finally, 
column 6 shows the first bootstrap sample Y? observations. For example, by (11.57), we 
obtain Y? = Y, + e; = 347.98 — 19.88 = 328.1. 

When the Y? values in column 6 are regressed against the X values in column 1, based 
on simple linear regression model (2.1), we obtain b? — 3.7564. In the same way, 999 other 
bootstrap samples were selected and bj obtained for each. Figure 11.13 contains a histogram 


TABLE 1 1.9 
Bootstrapping 
with Fixed X 
Sampling— 
Toluca 
Company 
Example. 


FIGURE 11.13 
Histogram of 
Bootstrap 
Estimates 
bj—Toluca 
Company 
Example. 


Chapter 11 Building the Regression Model HI: Remedial Measures 461 


(1) (2) (3) (4) (5) (6) 
Original Sample Bootstrap Sample 1 

i Xi Y Ў, е ег ү? 
1 80 399 347.98 51.02 —19.88 328.1 
„2 30 121 169.47 —48:47 10.72 180.2 
3 50 221 240.88 —19.88 —6.68 234.2 
23 40 244 205.17 38.83 4.02 209.2 
24 80 342 347.98 —5,98 —45.17 є 302.8 
25 70 323 312.28 10.72 51.02 ‚363.3 

0.15 1 


0,10 


0.05 


Rel. Frequency 


3.36 . А = 
Bootstrap Ёт 


2.88 


2.40 


7 (025) = 2.940 5407} = -3251 7 (.975) = 4.211 P 
of the 1,000 bootstrap Бү estimates. Note that this bootstrap sampling distribution is fairly 
symmetrical and appears to be close to a normal distribution. We also see in Figure 11.13 
that the standard deviation of the 1,000 Бү estimates is s* (bt) = .3251, which is quite close 
to the analytical estimate s{b,} = .3470. 

To obtain an approximate 95 percent confidence interval for В, by the bootstrap reflection 
method, we note in Figure 11.13 that the 2.5th and 97.5th percentiles of the bootstrap sam- 
pling distribution are b] (.025) = 2.940 and b1(.975) — 4.211, respectively. Using (11.58), 
we obtain: 


d, = 3.5702 — 2.940 — .630 
Ф = 4.211 — 3.5702 = .641 , 


Finally, we use (11.59) to obtain the confidence limits 3.5702 + 630 = 4.20 and 
3.5702 — .641 = 2.93 so that the approximate 95 percent confidence interval for p, is: 


2.93 < By < 4.20 


Note that these limits are quite close to the confidence limits 2.85 and 4.29 obtained by 
analytical methods. 


462 Part Two Multiple Linear Regression 


Example 2— 
Bloo 
Pressure 


TABLE 11.10 
Bootstrapping 
with Random 
X Sampling— 
Blood Pressure 
Example. 


For the blood pressure example in Table 11.1, the analyst used weighted least Squares 
in order to recognize the unequal error variances and fitted a standard deviation function 
to estimate the unknown weights. The standard inference procedures employed by the 
analyst for estimating the precision of the estimated regression coefficient Day = 50634 
and for obtaining a confidence interval for 6; ave therefore only approximate. To examine 
whether the approximation is good here, we shall evaluate the precision of the estimated 
regression coefficient in a way that recognizes the impreciseness of the weights by usine 
bootstrapping. The X variable (age) probably should be regarded as random and the еа 
variance varies with the level of X, so we shall use random X sampling. Table 11,10 repeats 
from Table 11.1 the original data for age (X) and diastolic blood pressure (Y) in columns ] 
and 2. Columns 3 and 4 contain the (X7, У?) observations for the first bootstrap sample 
selected with replacement from columns | and 2. When we now regress Y* on X* by 
ordinary least squares, we obtain the fitted regression function: 


ў* = 50.384 + .7432Х* 


The residuals for this fitted function are shown in column 5. When the absolute values of 
these residuals are regressed on X*, the fitted standard deviation function obtained is: 


$* = —5.409 + .32745Х* 


The fitted values $7 are shown in column 6. Finally, the weights ш? = 1/($7)? are shown in 
column 7. For example, wf = 1/(10.64)? = .0088. Finally, Y* is regressed on X* by using 
the weights in column 7, to yield the bootstrap estimate bf = .838. 

This process was repeated 1,000 times. The histogram of the 1,000 bootstrap values b 
is shown in Figure 11.14 and appears to approximate a normal distribution. The standard 
deviation of the 1,000 bootstrap values is shown in Figure 11.14; itis s*{b}} = .0825. When 
we compare this precision with that obtained by the approximate use of (11.13), .0825 versus 
.07924, we see that recognition of the use of estimated weights has led here only to a small 
increase in the estimated standard deviation. Hence, the variability in b, associated with 
the use of estimated variances in the weights is not substantial and the standard inference 
procedures therefore provide a good approximation here. 

А 95 percent bootstrap confidence interval for В, can be obtained from (11.59) by 
using the percentiles b1(.025) = .4375 and bj (.975) = .7583 shown in Figure 11.14. The 


(1) (2) (3) (4) (5) (6) (7) 
Original Sample Bootstrap Sample 1 

i X; Y; x? ў? е? S ит 
1 27 73 49 101 14.20 10.64 .0088 
2 21 66 34 73 —2.65 5.72 .0305 
3 22 63 49 101 14.20 10.64 .0088 
:52 52 100 46 89 4.43 9.65 .0107 
53 58 80 27 73 2.55 3.43 .0850 


54 57 109 40 70 —10.11 7.69 .0169 


Chapter 11 Building the Regression Model Ш: Remedial Measures 463 


0.15 


Rel. Frequency 
e 
© 


0.05 


E 


0.30 043 0.56 0.69 0.82 0.95 t 
Bootstrap Р] 


bt (025) = .4375 5*{Ь%} = .0825 * (.975) = .7583 
1 1 


approximate 95 percent confidence limits are [recall from (11.20) that b,,, = .59634]: 


bwi — Ф = .59634 — (.7583 — .59634) = .4344 
bwi + di = 59634 + (.59634 — 4375) = .7552 


and the confidence interval for f, is: 
434 < В, < .755 


Note that this confidence interval is almost the same as that obtained earlier by standard 
inference procedures (.437 < f, < .755). This again confirms that it is appropriate to-use 
standard inference procedures here even though the weights were estimated. 


Comment 

The reason why d; is associated with the upper confidence limit in (11.59) and d) with the lower 
limit is that the upper (1 — &/2)100 percentile in the sampling distribution of b, identifies the lower 
confidence limit for Ві, whereas the lower (0/2) 100 percentile identifies the upper confidence limit. 
To see this, consider the sampling distribution for b,, for which we can state with probability 1 — a 
that b, will fall between: 


bi(o/2) < by < b —o/2) (11.60) 


where b,(a@/2) and b,(1 — a@/2) denote the (&/2)100 and (1 — œ/2)100 percentiles of the sampling 
distribution of bı. We now express these percentiles in terms of distances from the mean of the 


sampling distribution, E{b,;} = В: Й 
VOR ae x (11.61) 
D = (1 — &/2) — В 
and obtain: і 
01(9/2) = В – Di (11.62) 


bi1 —&/2) = В, + D2 


464 Part Two 


Multiple Linear Regression 


Substituting (11.62) into (11.60) and rearranging the inequalities so that £, is in the middle leads to 
the limits: 


b —Dixfizh-cD, 


The confidence interval in (11.59) is obtained by replacing D, and D2 by d, and d», which involves 
using the percentiles of the bootstrap sampling distribution as estimates of the corresponding рег. 
centiles of the sampling distribution of b, and using b, as the estimate of the mean Б; of the sampling 
distribution. Н 


11.6 Case Example—MNDOT Traffic Estimation 


Traffic monitoring involves the collection of many types of data, such as traffic volume, 
traffic composition, vehicle speeds, and vehicle weights. These data provide information for 
highway planning, engincering design, and traffic control, as well as for legislative decisions 
concerning budget allocation, selection of state highway routes, and the setting of speed 
limits. One of the most important traffic monitoring variables is the average annual daily 
traffic (AADT) for a section of road or highway. AADT is defined as the average, over а 
year, of the number of vehicles that pass through a particular section of a road each day. 
Information on AADT is often collected by means of automatic traffic recorders (ATRs), 
Since it is not possible to install these recorders on all state road segments because of 
the expense involved, Cheng (Ref. 11.19) investigated the use of regression analysis for 
estimating AADT for road sections that are not monitored in the state of Minnesota, 


The AADT Database 


Seven potential predictors of traffic volume were chosen from the Minnesota Department 
of Transportation (MNDOT) road-log database, including type of road section, population 
density in the vicinity of road section, number of lanes in road section, and road sec- 
tion's width. Four of the seven variables were qualitative, requiring 19 indicator variables. 
Preliminary regression analysis indicated that the large number of levels of two of the qual- 
itative variables was not helpful. Consequently, judgment and statistical information about 
marginal reductions in the error sum of squares were used to collapse the categories, so 
only 10 instead of 19 indicator variables remained in the AADT database. 
The variables included in the initial analysis were as follows: 


CTYPOP (X;)—population of county in which road section is located (best proxy 
available for population density in immediate vicinity of road section) 

LANES (X2)—number of lanes in road section 

WIDTH (X3)—width of road section (in feet) 

CONTROL (X4)—two-category qualitative variable indicating whether or not there is 
control of access to road section (1 — access control; 2 — no access control) 

CLASS (Xs, Xe, X7)—four-category qualitative variable indicating road section 
function (1 — rural interstate; 2 — rural noninterstate; 3 — urban interstate, 
4 — urban noninterstate) 

TRUCK (Xs, Xo, Хо, X1)—five-category qualitative variable indicating availability 
status of road section to trucks (c.g., tonnage and time-of-year restrictions) 


TABLE 11. 1 1 Data—MNDOT Traffic Estimation Example. 


1; ‚266 


Chapter 11 Building the Regression Model III: Remedial Measures 465 


Access Function Truck 
County Control Class Route Locale 
Population Lanes Width Category Category Category Category 
Xn Xiz Хз Ха (Xis їо X7) (Xigto Xii) — (Xni, Хз) 

1,616 13,404 2 52 2 2 5 1 
1,329 52,314 2 60 2 2 5 1 
3,933 30,982 2 57 2 4 5 2 
14; 905 459,784 4 68 2 4 5 2 

15,408 459,784 2 40 2 4 5 3. 
43,784 2 44 2. 4 5 2 


Source: C. Cheng. “Optimal Sampling for Traffic Volume Estimation,” unpublished Ph.D. dissertation, University of Minnesota, Carlson School of Management, 1992. 


2 


LOCALE (X 2, X13)—three-category qualitative variable indicating type of locale 
(1 = rural; 2 — urban, population « 50,000; 3 — urban, population > 50,000) 


A portion of the data is shown in Table 11.11. Altogether, complete records for 121 ATRs 
were available. For conciseness, only the category is shown for a qualitative variable and 
not the coding of the indicator variables. 


Model Development 


A SYSTAT scatter plot matrix of the data set, with lowess fits added, is presented in Fig- 
ure 11.15. We see from the first row of the matrix that several of the predictor variables 
are related to AADT. The lowess fits suggest a potentially curvilinear relationship between 
LANES and AADT. Although the lowess fits of AADT to the qualitative categories desig- 
nated 1, 2, 3, etc., are meaningless, they do highlight the average traffic volume for each 
category. For example, the lowess fit of AADT to CLASS shows that average AADT for 
the third category of CLASS is higher than for the other three categories. The scatter plot 
matrix also suggests that the variability of AADT may be increasing with some predictor 
variables, for instance, with CTYPOP. 

An initial regression fit of a first-order model with ordinary least squares, using all 
predictor variables, indicated that CT YPOP and LANES are important variables. Regression 
diagnostics for this initial fit suggested two potential problems. First, the residual plot 
against predicted values revealed that the error variance might not be constant. Also, the 
maximum variance inflation factor (10.41) was 24.55, suggesting a severe degree of multi- 
collinearity. The maximum Cook's distance measure (10.33) was .2076, indicating that 
none of the individual cases 1s particularly influential. Since many of the variables appeared 
to be unimportant, we next considered the use of subset selection procedures to identify 
promising, initial models. 

The SAS all-possible-regressions procedure, PROC RSQUARE, was used for subset 
selection. To reduce the volume of computation, CT YPOP and LANES were forced to be 
included. The SAS output is given in Figure 11.16. The left column indicates the number 
of X variables in the model, i.e., p — 1. The names of the qualitative variables identify the 


466 Рай Тумо Multiple Linear Regression 


FIGURE 11.15  SYSTAT Scatter Plot Matrix—MNDOT Traffic Estimation Example. 


1 8 
' { t 
~ 

re A 


predictor variable and the category for which the indicator variable is coded 1. For example, 
CLASS 1 refers to the first indicator variable for the predictor variable CLASS; i.e., it refers 
to Xs, whichis coded 1 for category 1 (rural interstate). Two simple models look particularly 
promising. The three-variable model consisting of X; (CTYPOP), X; (LANES), and X; 
(CLASS — 3) stands out as the best three-variable model, with R? = .805 and G = 5.23. 
Since p —4 for this model, the C, statistic suggests that this model contains little bias. 
The best four-variable model includes X, (CTYPOP), X; (LANES), X4 (CONTROL = 1), 
and X; (CLASS = 1). With this model, some improvements in the selection criteria arè 
realized: К? = .812 and С, = 2.65. On the basis of these results, it was decided to investigat? 


FIGURE 11.16 


SAS 
All-Possible- 
Regressions 
Qutput— 
MNDOT 
Traffic 
Estimation 
Example. 


Chapter 11 Building the Regression Model HI- Remedial Measures 


N = 121 Regression Models for Dependent Variable: AADT 


R-square 
In 


C(p) Variables in Model 


2 0.694589 69.7231 СТҮРОР LANES 


NOTE: The above variables are included in all models to follow 


3 0.804522 
. 751353 
.725755 
.704495 
.704250 


5 


.2315 
37. 


CLASSS 


0.812099 
0.810364 
0.808001 
0.807122 
0.806300 


CONTROL1 CLASS1 , 
CLASS3 LOCALE2 
CLASS3 LOCALEi 
CLASS2 CLASS3 
CLASS3 TRUCKA 


5 0.816245 
5 0.815842 
5 0.814362 
5 0.813901 
5 0.812788 


CONTROL1 CLASS1 LOCALE2 
CONTROL1 CLASS1 LOCALE1 
CONTROL1 CLASS1 CLASS2 
CONTROLi CLASS1 TRUCK4 
CONTROL1 CLASS1 TRUCK2 


6 0.818304 
6 0.817992 
6 0.817915 
6 0.817741 
6 0.817738 


WIDTH CONTROL1 CLASS1 LOCALE1 
CONTROL1 CLASS1 TRUCK4 LOCALE2 
CONTROL1 CLASS1 TRUCK2 LOCALE2 
CONTROL1 CLASS1 TRUCK2 LOCALE1 
WIDTH CONTROL1 CLASS1 LOCALE2 


.820443 
„819942 
„819473 
.819180 
7 0.819007 


WIDTH CONTROL1 CLASS1 TRUCK4 LOCALE1 
WIDTH CONTROL1 CLASS1 TRUCK4 LOCALE2 
WIDTH CONTROL1 CLASS1 TRUCK2 LOCALE1 
CONTROLi CLASS1 TRUCK2 TRUCK4 LOCALE2 
WIDTH CONTROL1 CLASS1 CLASS2 LOCALE1 


467 


a model based on the five predictor variables included i in these two models: X, (CTYPOP), 
X2 (LANES), X4, (CONTROL = 1), Xs (CLASS = 1), and X; (CLASS — 3). Note that be- 
cause Xs (CLASS = 2) has been dropped from further consideration, the rural noninterstate 
(CLASS = 2) and urban noninterstate (CLASS = 4) categories ôf the CLASS variable have 
been collapsed into one category. 

Figure 11.17a contains a plot of the studentized residuals against the fitted values for the 
five-variable model. The plot reveals two potential problems: (1) The residuals tend to be 
positive for small and large values of Ў and negative for intermediate values, suggesting a 
curvilinearity in the response function. (2) The variability of the residuals tends to increase 
with increasing Ў, indicating nonconstancy of the error variance. 


468 Part Two Multiple Linear Regression 


FIGURE 11.17 Plots of Studentized Residuals versus Fitted Values—MNDOT Traffic Estimation Example, 


Studentized Residual 


(a) (b) 
Ordinary Least Squares Fit of Initial Ordinary Least Squares Fit of Final 
(First-Order) Model (Second-Order) Model 
6 
© © 
4 
© E e " P 
e € А eee © e 
e Ф 
© [2 8 Be e e @ 
$. ө HEN S oc 8 
$ об ag B ie 
8 @ d К E ғ е e? 
#3 8 2? 
e 8 ° “a e e 
Co 9 
ө 
e —4 . e 
© 
„== i L. П 1 —6 L кА j 
0 20000 40000 60000 80000 100000 0 50000 100000 150000 
Fitted Value Fitted Value 


Curvilinearity was investigated next, together with possible interaction effects. A squared 
term for each of the two quantitative variables (CTYPOP and LANES) was added to the 
pool of potential X variables. To reduce potential multicollinearity problems, each of these 
variables was first centered. In addition, nine cross-product terms were added to the pool 
of potential X variables, consisting of the cross products of the X variables for the four 
predictor variables. 

The SAS all-possible-regressions procedure was run again for this enlarged pool of 
potential X variables (output not shown). Analysis of the results suggested a model with five 
X variables: CTYPOP, LANES, LANES?, CONTROL], and CTYPOP x CONTROLI. For 
this model, К, is .925, and all P-values for the regression coefficients are 0+. Although this 
model does not have the largest R? value among five-term models, it is desirable because itis 
easy to interpret and does not differ substantially from other models favorably identified by 
the C, or ЕК, criteria. A plot of the studentized residuals against Ў, shown in Figure 11.1%, 
indicates that curvilinearity is no longer present. Also, neither Cook's distance measure 
(maximum = .47) nor the variance inflation factors (maximum = 2.5) revealed senous 
problems at this stage. Nonconstancy of the error term variance has persisted, however, as 
confirmed by the Breusch-Pagan test. 


Weighted Least Squares Estimation 


To remedy the problem with nonconstancy of the error term variance, weighted least squares 
was implemented by developing a standard deviation function. Residual plots indicated that 
the absolute residuals vary with CTYPOP and LANES. A fit of a first-order model where 
the absolute residuals are regressed on CTYPOP and LANES yielded an estimated standard 
deviation function for which К? = .386 and the P-values for the regression coefficients for 
CTYPOP and LANES are .001 and 0+. Note that, as is often the case, the А? value for 


Chapter 11 Building the Regression Model III: Remedial Measures 469 


The regression equation is 
AADT = 9602 + 0.0146 CTYPOP + 6162 LANES + 16556 CONTROL1 + 2250 LANES2 
+ 0.0637 POPXCTL1 


Predictor Coef Stdev t-ratio р 
Constant 9602 1432 6.71 0.000 
CTYPUP 0.014567 0.003047 4.78 0.000 
LANES 6161.8 933.9 6.60 0.000 
CONTROL1 16556 2966 5.58 0.000 
LANES2 2249.7 755.8 2.98 0.004 
POPXCTL1 0.063696 0.008421 7.56 0.000 


ug 


Analysis of Variance 


SOURCE DF SS MS F P 
Regression 5 919.55 183.91 93.13 0.000 , 
Error 115 227.10 1.97 

Total 120 1146.65 


the estimated standard deviation function (.386) is substantially smaller than that for the 
estimated response function (.925). 

Using the weights obtained from the standard deviation function, weighted least squares 
estimates of the regression coefficients were obtained. Since some of the estimated regres- 
sion coefficients differed substantially from those obtained with unweighted least squares, 
the residuals from the weighted least squares fit were used to reestimate the standard devi- 
ation function, and revised weights were obtained. Two more iterations of this iteratively 
reweighted least squares process led to stable estimated coefficients. 

MINITAB regression results for the weighted least squares fit based on the final weights 
are shown in Figure 11.18. Note that the signs of the regression coefficients are all positive, 
as might be expected: 


CTYPOP: Traffic increases with local population density 
LANES: Traffic increases with number of lanes 
CONTROLI: Traffic is highest for road sections under access control 


LANES?: An upward-curving parabola is consistent with the shape of the lowess fit of 
AADT to LANES in Figure 11.15 


CTYPOP x CONTROL: Traffic increase with access control is more pronounced for 
higher population density 


Figure 11.19a contains a plot of the studentized residuals against the fitted values, and 
Figure 11.19b contains a normal probability plot of the studentized residuals. Notice that 
the variability of the studentized residuals is now approximately constant. While the nor- 
mal probability plot in Figure 11.19b indicates some departure from normality (this was 
confirmed by the correlation test for normality), the departure does not appear to be serious, 
particularly in view of the large sample size. 

To assess the usefulness of the model for estimating AADT, approximate 95 percent 
confidence intervals for mean traffic for typical rural, suburban, and urban road sections 


470 Part Two Multiple Linear Regression 


FIGURE 11.19 
Residual Plots 
for Final 
Weighted Least 
Squares 
Regression 

Fit —4NDOT 
Traffic 
Estimation 
Example. 


TABLE 11.12 
Example. 


Road 
Section 


Rural 
Suburban 
Urban 


(а) Residual Plot against Y (b) Normal Probability Plot 
6 
_ 4 
3 5 ° 
S S e 
& © 8 2 
© © 
Е g " t? 0 
Ф © Ф 
pej e o 
3 © 23 
a 9 a 
e -2re 
\ | J -4 qi П П NENNT 
50000 100000 150000 -3 -2 -1 0 1 2 3 


Fitted Value Expected 


95 Percent Approximate Confidence Limits for Mean Responses—MNDOT Traffic Estimation 


(1) (2) (3) (4) (5) (6) (7) 
Confidence Limits 


—————————————— 


CTYPOP LANES CONTROL1 f, s( V.) Lower Upper 
113,571 2 0 3,365 354 2,663 4,066 
222,229 4 0 16,379 1,827 12,758 19,999 ; 
941,411 6 1 116024 6,597 102,953 129,095 


were constructed. The levels of the predictor variables for these road sections are given in 
Table 11.12, columns 1-3. The estimated mean traffic is given in column 4. The approxi- 
mate estimated standard deviations of the estimated mean responses for each of these road 
sections, shown in column 5, were obtained by using s’{b,,} from (11.13) in (6.58): 


S(r} = ХЫ, }Х,, = MSE,X,(X'WX) ! X, (11.63) 


where the vector X, is defined in (6.53). Since the estimated standard deviations in column 
are only approximations because the least squares weights were estimated by means of 
а standard deviation function, bootstrapping with random X sampling was employed to 
assess the precision of the fitted values. The standard deviations of the bootstrap sampling 
distributions were close to the estimated standard deviations in column 5. The consistency 
of the results shows that the iterative estimation of the weights by means of the standard 
deviation function did not have any substantial effect here on the precision of the fitted 
values. 

The approximate 95 percent confidence limits for E(Y,), computed using (6.59). are pr 
sented in columns 6 and 7 of Table 11.12. The precision of these estimates was consideredto 
be sufficient for planning purposes. However, because the suburban and rural road estimates 


Asti sea ey 


Cited 
References 


Chapter 11 Building the Regression Model III: Remedial Measures 471 


have the poorest relative precision, it was recommended that better records be developed 
for population density in the immediate vicinity of a road section, since county population 
does not always reflect local population density. The improved information could lead to a 
better regression model, with more precise estimates for road sections in rural and suburban 
settings. 

The approach for developing the regression model described here is not, of course, the 
only approach that can lead to a useful regression model, nor 1s the analysis complete as 
described. For example, the residual plot in Figure 11.19a suggests the presence of at least 
one outlier (ro? — 5.02). Possible remedial measures for this case should be considered. 
In addition, the departure from normality might be remedied by a transformation of the 
response variable. This transformation might also stabilize the variance of the error terms 
sufficiently so that weighted least squares would not be needed. In fact, subsequent analysis 
using the Box-Cox transformation approach found that a cube root transformation of the 
response is very effective in this instance. A final choice between the model fit obtained by 
weighted least squares and a model fit developed by an alternative approach can be made 
on the basis of model validation studies. І 


ГА 


11.1. Davidian, M., апа R.J. Carroll. “Variance Function Estimation,’ Journal of the American 
Statistical Association 82 (1987), pp. 1079—91. 
11.2. Greene, W. H. Econometric Analysis, 5th ed. Upper Saddle River, New Jersey: Prentice Hall, 
2003. 
11.3. Belsley, D. A. Conditioning Diagnostics: Collinearity and Weak Data in Regression. New 
York: John Wiley & Sons, 1991. 
11.4. Frank, I. E., and J. Н. Friedman. “A Statistical View of Some Chemometrics Regression Tools,” 
Technometrics 35 (1993), рр. 109-35. 
11.5. Hoaglin, D. C., F. Mosteller, and J. W. Tukey. Exploring Data Tables, Trends, and Shapes. 
New York: John Wiley & Sons, 1985. 
11.6. Rousseeuw, P. J., and A. M. Leroy. Robust Regression and Outlier Detection. New York: John 
Wiley & Sons, 1987. 
11.7. Kennedy, W. J., Jr., and J. E. Gentle. Statistical Computing. New York: Marcel Dekker, 1980. 
11.8. ETS Policy Information Center. America's Smallest School: The Family. Princeton, N.J.: 
Educational Testing Service, 1992. 
11.9. Haerdle, W. Applied Nonparametric Regression. Cambridge: Cambridge University Press, 
1992. 
11.10. Cleveland, W. S., and S. J. Devlin. “Locally Weighted Regression: An Approach to Regression 
Analysis by Local Fitting,” Journal of the American Statistical Association 83 (1988), pp. 596- 
610. 
11.11. Breiman, L.; J. Н. Friedman; К. A. Olshen; and C.J. Stone. Classification and Regression 
Trees. Belmont, Calif.: Wadsworth, 1984. 
11.12. Friedman, J. H., and W. Stuetzle. “Projection Pursuit Regression,” Journal of the American 


Statistical Association 76 (1981), pp. 817-23. r 
11.13. Eubank, R. L. Spline Smoothing and Nonparametric Regression, 2nd ed. New York: Marcel 
Dekker, 1999. Е * 


11.14. Hastie, T., and C. Loader. “Local Regression: Ашотайс Kernel Carpentry" (with discussion), 
Statistical Science 8 (1993), pp. 120-43. 

11.15. Hastie, T., Tibshirani, R., and J. Friedman. The Elements of Statistical Learning: Data Mining, 
Inference, and Prediction. New York: Springer-Verlag, 2001. 


472 Part Two Multiple Linear Regression 


11.16. 


11.17. 


11.18. 


11.19. 


Efron. B. The Jackknife. The Bootstrap, and Other Resampling Plans. Philadelphia. Penn. 
Society for Industrial and Applied Mathematics, 1982. oe 
Efron, B., and R. Tibshirani. “Bootstrap Mcthods for Standard Errors, Confidence Intervals, 
and Other Measures of Statistical Accuracy,” Statistical Science | (1986), рр. 54-77. 
Efron. B. “Better Bootstrap Confidence Intervals” (with discussion), Journal of the Alnerican 
Statistical Association 82 (1987), pp. 171—200. 

Cheng, C. “Optimal Sampling for Traffic Volume Estimation,” unpublished Ph.D, dissertation 
University of Minnesota. Carlson School of Management. 1992. ` 


Problems 


11.1. 


112. 


113. 


11.4. 


11.5. 


11.6. 


One student remarked to another: “Your residuals show that nonconstancy of error Variance 
is clearly present. Therefore, your regression results are completely invalid.” Comment. 

An analyst suggested: ^One nice thing about robust regression is that you need not worry 
about outliers and influential observations.” Comment. 

Lowess smoothing becomes difficult when there are many predictors and the sample size ig 
small. This is sometimes referred to as the “curse of dimensionality.” Discuss the nature of 
this problem. 

Regression trees become difficult to utilize when there are many predictors and the sample 
size is small. Discuss the nature of this problem. 

Describe how bootstrapping might be used to obtain confidence intervals for regression coef. 
ficients when ridge regression is employed. 

Computer-assisted learning. Data from а study of computer-assisted learning by 12 students, 
showing the total number of responses in completing a lesson (X) and the cost of computer 
time (Y. in cents), follow. 


i 1 2 3 4 5 6 7 8 9 10 1 n2 


Xi: 16 14 22 10 14 17 10 13 19 12 18 1 
Yi: 77 70 85 50 62 70 55 63 88 57 81 51 


a. Fit a linear regression function by ordinary least squares, obtain the residuals, and plot the 

residuals against X. What does the residual plot suggest? 

b. Divide the cases into two groups, placing the six cases with the smallest fitted values Ў, 
into group | and the other six cases into group 2. Conduct the Brown-Forsythe test for 
constancy of the error variance. using œ = .05. State the decision rule and conclusion, 

c. Plot the absolute values of the residuals against X. What does this plot suggest about the 
relation between the standard deviation of the error term and X? 

d. Estimate the standard deviation function by regressing the absolute values of the residuals 
against X. and then calculate the estimated weight for each case using (11.162). Which 
case receives the largest weight? Which case receives the smallest weight? 

e. Using the estimated weights. obtain the weighted least squares estimates of By and f. Ате 
these estimates similar to the ones obtained with ordinary least squares in part (а)? 

f. Compare the estimated standard deviations of the weighted least squares estimates byo 
and by; in part (е) with those for the ordinary least squares estimates in part (а). What do 
you find? 

. Iterate the steps in parts (d) and (e) one more time. Is there a substantial change in the 
estimated regression coefficients? If so, what should you do? 


ga 


. Chapter 11 Building the Regression Model III: Remedial Measures 473 


*11.7. Machine speed. The number of defective items produced by a machine (Y) is known to be 


11.8. 


linearly related to the speed setting of the machine (X). The data below were collected from 
recent quality control records. 


i: 1 2 3 4 5 6 7 8 9 10 11 12 


X: 200 400 зоо 400 200 300 300 400 200 400 200 300 
Y: 28 75 37 53 22 58 40 96 46 52 30 69 


a. Fit a linear regression function by ordinary least squares, obtain the residuals, and plot the 
residuals against X. What does the residual plot suggest? 

b. Conduct the Breusch-Pagan test for constancy of the error variance, assuming log, o? = 
Yo + 5 Xi; use œ = .10. State the alternatives, decision rule, and conclusión. 

c. Plotthe squared residuals against X. What does the plot suggest about the relation between 
the variance of the error term and X? i 

d. Estimate the variance function by regressing the squared residuals against-X, and then 
calculate the estimated weight for each case using (11. 16b). 

e. Using the estimated weights, obtain the weighted least squares estimates of Во апа f. 
Are the weighted least squares estimates similar to the ones obtained with ordinary least 
squares in part (a)? 

f. Compare the estimated standard deviations of the weighted least squares estimates Ро and 
b, in part (e) with those for the ordinary least squares estimates іп part (а). What do you 
find? 

g. Iterate the steps in parts (d) and (e) one more time. Is there a substantial change in the 
estimated regression coefficients? If so, what should you do? 


Employee salaries. A group of high-technology companies agreed to share employee salary 
information in an effort to establish salary ranges for technical positions in research and 
development. Data obtained for each employee included current salary (Y), a coded vari- 
able indicating highest academic degree obtained (1 — bachelor's degree, 2 — master's degrée, 
3 — doctoral degree), years of experience since last degree (X3), and the number of persons 
currently supervised (X4). The data follow. 


Employee 
i Yn Degree Хв Хі 
1 58.8 3 4.49 0 
2 34.8 1 2.92 0 
3 163.7 3 29.54 42 
63 40.0 2 44 0 
64 60.5 '3 2.10 0 
65 104.8 3 19.81 24 


a. Create two indicator variables for highest degree attained: 


Degree * X X2 
Bachelor's 0 0 
Master's 1 0 
Doctoral 0 1 


474 PartTwo Multiple Linear Regression 


11.9. 


*11.10. 


b. Regress Y on Xj. Xa. Xs, and X4, using a first-order model and ordinary leas, Square 
obtain the residuals. and plot them against Y. What does the residual plot Suggest? 3 

c. Divide the cases into two groups, placing the 33 cases with the smallest fitted values ў, 
into group 1 and the other 32 cases into group 2. Conduct ће Brown-Forsythe test for 
constancy of the error variance, using œ = .01. State the decision rule and Conclusion, 

d. Plot the absolute residuals against Хз and against X4. What do these plots suggest about 
the relation between the standard deviation of the error term and X; and X? 

e. Estimate the standard deviation function by regressing the absolute residuals against 
X; and Ху in first-order form, and then calculate the estimated weight for each case 
using (11.162). 

f. Using the estimated weights, obtain the weighted least squares fit of the regression Model. 
Are the weighted least squares estimates of the regression coefficients simil w the ones 

obtained with ordinary least squares in part (b)? 

. Compare the estimated standard deviations of the weighted least squares coefficient esti- 
mates in part (Ё) with those for the ordinary least squares estimates in part (b). What do 
you find? 


са 


s 
h. kerate the steps in parts (€) and (f) one more time. Is there a substantial change in the 
estimated regression coefficients? If so, what should you do? 
Refer to Cosmetics sales Problem 10.13. Given below are the estimated ridge standardized 
regression coefficients, the variance inflation factors, and R? for selected biasing constants ¢ 


c .00 .01 .02 .04 06 .08 -09 10 
bi: 490 461 443 463 410 -401 -398 394 
[^E .296 322 336 -349 -354 .356 356 .356 
bi: .169 167 167 166 165 164 164 164 
(VIF): 20.07 10.36 6.37 3.20 1.98 1.38 1.20 1.05 
(VIF)2: 20.72 10.67 6.55 3.27 2.07 1.40 1.21 1.06 
(VIF): 1.22 147 114 1.08 1.02 98 95 93 
R?: ‚7417 ‚7416 7145  .7412 409  .7045  .7402 7399 


a. Make a ridge trace plot for the given c values. Do the ridge regression coefficients exhibit 
substantial changes near c — 0? 

b. Suggest a reasonable value for the biasing constant c based on the ridge trace, the VIF 
values, and К”. 

c. Transform the estimated standardized regression coefficients selected in part (b) back to 
the original variables and obtain the fitted values for the 44 cases. How similar are these 
fitted values to those obtained with the ordinary least squares fit in Problem 10.13a? 


Chemical shipment. The data to follow, taken on 20 incoming shipments of chemicals 
in drums arriving at a warehouse, show number of drums in shipment ( X,), total weight 
of shipment (Xə, in hundred pounds) and number of minutes required to handle 
shipment (Y). 


i 1 2 3 18 19 20 
Ха 7 18 5 21 6 Ш 
Xi5 5.11 16.72 3.20 15.21 3.64 957 

Y; 58 152 41 155 39 90 


Chapter 11 Building the Regression Model III: Remedial Measures 475 


Given below arethe estimated ridge standardized regression coefficients, the variance inflation 
factors, and R? for selected biasing constants c. 


c .000 .005 .01 .05 .07 .09 ло .20 
bf; (451 .453 4455 -460 .460 .459 458 444 
bf; .561 .556 -552 -526 517 .508 504 .473 
(VIF), = (МІР)2: 7.03 6.20 5.51 2.65 2.03 1.61 1.46 71 
R?: .9869 99869  .9869  .9862 9856  .9852  .9844 .9780 


a. Fit regression model (6.1) to the data and find the fitted values. 

b. Make a ridge trace plot for the given c values. Do the ridge regression coefficients exhibit 
substantial changes near c — 0? 

c. Why аге the (VIF), values the same as the (VIF); values here? 

d. Suggest a reasonable value for the biasing constant c based on the ridge trace, the VIF £ 
values, and R?. B | 

e. "Transform the estimated standardized regression coefficients selected in part (c) back to 
the original variables and obtain the fitted values for the 20 cases. How similar are these 
fitted values to those obtained with the ordinary least squares fit in part (a)? 

*]1,11;” Refer to Copier maintenance Problem 1.20. Two cases had been held out of the original data 
set because special circumstances led to unusually long service times: 


Case 
i X Y, 
46 6 132 
47 5 166 


a. Using the enlarged (47-case) data set, fit a simple linear regression model using ordinary 
least squares and plot the data together with the fitted regression function. What is the 
effect of adding cases 46 and 47 on the fitted response function? T. 

b. Obtain the scaled residuals in (11.47) and use the Huber weight function (11.44) to obtain 
the case weights for a first iteration of IRLS robust regression. Which cases receive the 
smallest Huber weights? Why? 

c. Using the weights calculated in part (b), obtain the weighted least squares estimates of the 
regression coefficients. How do these estimates compare to those found in part (a) using 
ordinary least squares? 

d. Continue the IRLS procedure for two more iterations. Which cases receive the smallest 
weights in the final iteration? How do the final IRLS robust regression estimates compare 
to the ordinary least squares estimates obtained in part (a)? 

e. Plot the final IRLS estimated regression function, obtained in part (d), on the graph con- 
structed in part (a). Does the robust fit differ substantially from the ordinary least squares 
fit? If so, which fit is preferred here?" 


11.12. Weight and height. The weights and heights of twenty male students in a freshman class are 
recorded in order to see how well weight (Y, in pounds) can be predicted from height (X, in 
inches). The data are given below. Assume that first-order regression (1.1) is appropriate. 


i: 1 2 3 Е 18 19 20 


Xi: 74 65 72 oe 69 68 67 
Yj: 185 195 216 fc 177 145 137 


476 Part Two Multiple Linear Regression 


a. Fita simple linear regression model using ordinary least squares. and plot the data together 
with the fitted regression function. Also, obtain an index plot of Cook's distance ( 10.33) 
What clo these plots suggest? 5 

b. Obtain the scaled residuals in (11.47) and use the Huber weight function (1 1.44) to Obtain 
case weights fora first iteration of IRLS robust regression, Which cases receive the Smallest 
Huber weights? Why? 

c. Using the weights calculated in part (b), obtain the weighted least squares estimates Of the 
regression coefficients. How do these estimates compare to those found in part ( 


a) usin 
ordinary least squares? E 


d. Continue the IRLS procedure for two more iterations. Which cases receive the smallest 


weights in the final iteration? How do the linal IRLS robust regression estimates compare 
to the ordinary least squares estimates obtained in part (a)? 


Exercises 


1.13. 


11.14. 


11.15. 


11.16. 


11.17. 


11.18. 


11.19. 
11.20. 


(Calculus needed.) Derive the weighted least squares normal equations Tor litting a simple 
linear regression function when o7 = k X;, where К is a proportionality constant. 

Express the weighted least squares estimator by in (11.26a) in terms of the centered Variables 
Y; — Y, and X; — X,., where Y, and X,, are the weighted means. 

Refer to Computer-assisted learning Problem 11.6. Demonstrate numerically that the 
weighted least squares estimates obtained in part (e) are identical to those obtained using 
transformation (11.23) and ordinary least squares. 

Refer to Machine speed Problem 11.7. Demonstrate numerically that the weighted least 
squares estimates obtained in part (e) are identical to those obtained when using transforma- 
tion (11.23) and ordinary least squares. 

Consider the weighted least squares criterion (11.6) with weights given by w; = .3/ X;. Setup 
the variance-covariance matrix for the error terms when / = l..... 4. Assume c [&, €;} =0 
for i Æ j- 

Derive the variance-covariance matrix o7{b,-} in (11.10) for the weighted least squares esti- 
mators when the variance-covariance matrix of the observations Y; is kW~'. where Wis given 
in (11.7) and & is a proportionality constant. 

Derive the mean squared error in (11.29). 

Refer to the body fat example of Table 7.1. Employing least absolute residuals regression, the 
LAR estimates of the regression coefficients are by = — 17.027, b, = .4173, and b = 5203. 


a. Find the sum of the absolute residuals based on the LAR fit. 


b. For the least squares estimated regression coefficients by = —19.174, b, = .2224, and 
b» = .6594, find the sum of the absolute residuals. Is this sum larger than the sum obtained 
in part (2)? Is this to be expected? 


Projects 


11.21. 


Observations on Y are to be taken when X = 10, 20, 30, 40, and 50, respectively. The true 

regression function is E{Y} = 20 + 10X. The error terms are independent and normally 

distributed. with E[&;) = 0 and e? {e;} = .8X;. 

a. Generate a random Y observation for each X level and calculate both the ordinary and 
weighted least squares estimates of the regression coefficient £, in the simple linear re 
gression function. 


b. Repeat part (a) 200 times, generating new random numbers each time. 


Chapter 11 Building the Regression Model Ш: Remedial Measures 477 


c. Calculate the mean and variance of the 200 ordinary least squares estimates of В, and do 
the same for the 200 weighted least squares estimates. 


d. Do both the ordinary least squares and weighted least squares estimators appear to be 
unbiased? Explain. Which estimator appears to be more precise here? Comment. 


11.22. Refer to Patient satisfaction Problem 6.15. 


а. Obtain the estimated ridge standardized regression coefficients, variance inflation factors, 
and R? for the following biasing constants: c = .000, .005, .01, .02, .03, .04, .05. 

b. Make a ridge trace plot for the given c values. Do the ridge regression coefficients exhibit 
substantial changes near c — 0? 

с. Suggest a reasonable value for the biasing constant c based on the ridge trace, the VIF 
values, and R?. s 

d. Transform the estimated standardized regression coefficients selected in part (c) back to 
the original variables and obtain the fitted values for the 46 cases. How similar are these 
fitted values to those obtained with the ordinary least squares fit in Problem 6.15c? i 


11.23. Cement composition. Data on the effect of composition of cement on-heat evolved during 
hardening are given below. The variables collected were the amount of tricalcium alumi- 
nate (X,), the amount of tricalcium silicate (X2), the amount of tetracalcium alumino ferrite 

, (Хз), Ше amount of dicalcium silicate (X4), and the heat evolved in calories per gram of 


cement (Y). 
[E 1 2 3 s 11 12 13 
Xn: 7 1 11 zu. 1 11 10 
Xiz: 26 29 56 E 40 66 68 
Xia: 6 15 8 sx. 23 9 8 
Xia: 60 52 20 2) 34 12 12 
Y;: 78.5 74.3 104.3 Ко 83.8 113.3 109.4 


Adapted from Н. Woods, Н. Н. Steinour, апа Н. R. Starke, “Effect of Composition of Portland Cement on Heat 
Evolved During Hardening,” Industrial and Engineering Chemistry, 24, 1932, 1207—1214. 


a. Fit regression model (6.5) for four predictor variables to the data. State the estimated 
regression function. 

b. Obtain the estimated ridge standardized regression coefficients, variance inflation factors, 
and К? for the following biasing constants: c = .000, .002, .004, .006, .008, .02, .04, .06, 
08, .10. 

c. Make a ridge trace plot for the biasing constants listed in part (b). Do the ridge regression 
coefficients exhibit substantial changes near c — 0? 

d. Suggest a reasonable value for the biasing constant c based on the ridge trace, VIF values, 
and R? values. ы 

e. "Transform the estimated standardized ridge regression coefficients selected in part (d) to 
the original variables and obtain the fitted values for the 13 cases. How similar are these 
fitted values to those obtained with the ordinary least squares fit in part (а)? 

11.24. Refer to Commercial properties Problem 6.18. 

a. Use least absolute residuals regression to obtain estimates of the parameters Во, В, £5, Ёз, 
and Ba- 

b. Find the sum of the absolute residuals based on the LAR fit in part (a). 


478 Part Two Multiple Linear Regression 


11.25. Crop yield. An agronomist studied the effects of moisture ( X,. in inches) : 


11.27. 


c. For the least squares estimated regression function in Problem 6.18c, find the Sum 
the absolute residuals. Is this sum farger than the sum obtained in part (b)? Is this 10 be 
expected? 


and temperature 
(X2.in С) on the yield of a new hybrid tomato (У). The experimental data follow. 


i: 1 2 3 xs 23 24 25 
Хи: 6 6 6 15 14 14 14 
Xi: 20 21 22 E 22 23 24 

Yi: 49.2 48.1 48.0 М 421 43.9 40.5 


The agronomist expects that second-order polynomial regression model (8.7) with inde-- “endent 
normal error terms is appropriate here. 


a. Fit a second-order polynomial regression model omitting the interaction term and the 
quadratic effect term for temperature. 

b. Construct a contour plot of the fitted surface obtained in part (a). 

c. Use the lowess method to obtain a nonparametric estimate of the yield response surface 
as a function of moisture and temperature. Employ weight function (11.53), = 9/25, 
and a Euclidean distance measure with unscaled variables. Obtain fitted values Ӯ, for the 
9 x 9 rectangular grid of (X41, X45) values where Xm = 6. 7,.... 13. 14 and Xp = 
20, 20.5, .... 23.5. 24. using a local first-order model. 

d. Construct a contour plot of the resulting lowess surface. Are the lowess contours consistent 
with the contours in part (b) for the polynomial model? Discuss. 


. Refer to Computer-assisted learning Problem 11.6. 


a. Based on the weighted least squares fit in Problem 11.6e, construct an approximate 95 per- 
cent confidence interval for Ву by means of (6.50), using the estimated standard deviation 
S{bus}- 

b. Using random X sampling, obtain 750 bootstrap samples of size 12. For each bootstrap 
sample, (1) use ordinary least squares to regress Y on X and obtain the residuals, (2) 
estimate the standard deviation function by regressing the absolute residuals on X and 
then use the fitted standard deviation function and (11.162) to obtain weights. and (3) use 
weighted least squares to regress Y on X and obtain the bootstrap estimated regression 
coefficient Бү. (Note that for each bootstrap sample, only one iteration of the iteratively 
reweighted least squares procedure is to be used.) 

c. Construct a histogram of the 750 bootstrap estimates bf. Does the bootstrap sampling 
distribution of by appear to approximate a normal distribution? 

d. Calculate the sample standard deviation of the 750 bootstrap estimates Бү. How does this 
value compare to the estimated standard deviation s {bı} used in part (a)? 

e. Construct a 95 percent bootstrap confidence interval for В; using reflection method (11.59). 
How does this confidence interval compare with that obtained in part (a)? Does the ap- 
proximate interval in part (a) appear to be useful for this data set? 

Refer to Machine speed Problem 11.7. 

а. On the basis of the weighted least squares fit in Problem 11.7e, construct an approximate 
90 percent confidence interval for Ву by means of (6.50), using the estimated standard 
deviation {Б}. 

b. Using random X sampling, obtain 800 bootstrap samples of size 12. For each bootstrap 
sample, (1) use ordinary least squares to regress Y on X and obtain the residuals. (2) estimate 


11.28. 


11.29. 


Chapter 11 Building the Regression Model III: Remedial Measures 479 


the standard deviation function by regressing the absolute residuals on X and then use the 
fitted standard deviation function and (11.16а) to obtain weights, and (3) use weighted 
least squares to regress Y on X and obtain the bootstrap estimated regression coefficient bf. 
(Note that for each bootstrap sample, only one iteration of the iteratively reweighted least 
squares procedure is to be used.) 

c. Construct a histogram of the 800 bootstrap estimates bt. Does the bootstrap sampling 
distribution of bf appear to approximate a normal distribution? 

d. Calculate the sample standard deviation of the 800 bootstrap estimates Pr. How does this 
value compare to the estimated standard deviation s{b,,,} used in part (a)? 

e. Construct a 90 percent bootstrap confidence interval for £; using reflection method (11.59). 
How does this confidence interval compare with that obtained in part (а)? Does the 
approximate interval in part (а) appear to be useful for this data set? 


Mileage study. The effectiveness of a new experimental overdrive gear in reducing gasoline 
consumption was studied in 12 trials with a light truck equipped with this gear. In the data 
that follow, X; denotes the constant speed (in miles per hour) on the test track in the ith trial 
and Y; denotes miles per gallon obtained. 


БЕ, 2 3 4 5 6 7 8 9 10 11 12 


КА Хі 35 35 40 40 45 45 50 50 55 55 60 60 
y: 22 20 28 31 37 38 41 39 34 37 27 30 


Second-order regression model (8.2) with independent normal error terms is expected to be 

appropriate. 

a. Fit regression model (8.2). Plot the fitted regression function and the data. Does the 
quadratic regression function appear to be a good fit here? 

b. Automotive engineers would like to estimate the speed Xmax at which the average mileage 
E(Y] is maximized. It can be shown for second-order model (8.2) that Xmax = 
X —(.561/Bi1), provided that 61; is negative. Estimate the speed Xmax at which the average 
mileage is maximized, using max = X — (.5b,/b11). What is the estimated mean mileage 
at the estimated optimum speed? 

с. Using fixed X sampling, obtain 1,000 bootstrap samples of size 12. For each bootstrap 
sample, fit regression model (8.2) and obtain the bootstrap estimate P Sm 

d. Construct a histogram of the 1,000 bootstrap estimates x es Does the bootstrap sampling 
distribution of Xt e appear to approximate a normal distribution? 

e. Construct a 90 percent bootstrap confidence interval for Xmax using reflection method 
(11.56). How precisely has Xmax been estimated? 

Refer to Muscle mass Problem 1.27. 

a. Fita two-region regression tree. What is the first split point based on age? What is SSE for 
this two-region tree? d 

b. Find the second split point given the two=region tree in part (a). Whatis SSE for the resulting 
three-region tree? " 

c. Find the third split point given the three-region tree in part (b). What is SSE for the resulting 
four-region tree? * и 

d. Prepare a scatter plot of the data with the four-region tree in part (с) superimposed. How 
well does the tree fit the data? What does the tree suggest about the change in muscle mass 
with age? 

e. Preparea residual plot of e; versus Y; for the four-region tree in part (d). State your findings. 


480 PartTwo Multiple Linear Regression 


11.30. Refer to Patient satisfaction Problem 6.15. Consider only the first two predictors (Patient, 
аре, X,, and severity of illness, Хэ). 


a. 


Fit a two-region regression tree. What is the first split point, and on which predictor ; isi 
based? What is SSE for the resulting two-region tree? 


. Find the second split point given the two-region tree in part (a). Is it based on X, ор X 


What is SSE for the resulting three-region tree? 


. Find the third split point given the three-region tree in part (b). Is it based on X, or Xj 


What is SSE for the resulting four-region tree? 


Find the fourth split point given the four-region tree in part (c). Is it based on X, or X) 
What is SSE for the resulting five-region tree? 


. Prepare a three-dimensional surface plot of the five-region wee obtained in part (d). What 


does thís tree suggest about the relative importance of the two predictors? 


. Prepare a residual plot of e; versus Ў, for the five-region tree in part (d). State your findings 


Case 
Studies 


11.31. 


11.32. 


Refer to the Prostate cancer data set in Appendix С.5 and CasesStudy 9.30. Select a random 
sample of 65 observations to use as the model-building data set. 


a. 


Develop a regression tree for predicting PSA. J ustify your choice of number of regions 
(tree size), and interpret your regression tree. 
Assess your model’s ability to predict and discuss its usefulness to the oncologists. 


c. Compare the performance of your regression tree model with that of the best regression 


model obtained in Case Study 9.30. Which model is more easily interpreted and why? 


Refer to the Real estate sales data set in Appendix C.7 and Case Study 9.31. Select a random 
sample of 300 observations to use as the model-building data set. 


a. 


Develop a regression tree for predicting sales price. Justify your choice of number of 
regions (tree size), and interpret your model. 

Assess your model’s ability to predict and discuss its usefulness as a tool for predicting 
sales prices. 


. Compare the performance of your regression tree model with that of the best regression 


model obtained in Case Study 9.31. Which model is more easily interpreted and why? 


Chapter 


Autocorrelation in Time 
Series Data 


> 


The basic regression models considered so far have assumed that the random error terms 
є; are either uncorrelated random variables or independent normal random variables. In 
business and economics, many regression applications involve time series data. For such 
data, the assumption of uncorrelated or independent error terms is often not appropriate; 
rather, the error terms are frequently correlated positively over time. Error terms correlated 
over time are said to be autocorrelated or serially correlated. 

A major cause of positively autocorrelated error terms in business and economic regres- 
sion applications involving time series data is the omission of one or several key variables 
from the model. When time-ordered effects of such “missing” key variables are positively 
correlated, the error terms in the regression model will tend to be positively autocorrelated 
since the error terms include effects of missing variables. Consider, for example, the regres- 
sion of annual sales of a product against average yearly price of the product over a period 
of 30 years. If population size has an important effect on sales, its omission from the model 
may lead to the error terms being positively autocorrelated because the effect of population 
size on sales likely is positively correlated over time. 

Another cause of positively autocorrelated error terms in economic data is the presence 
of systematic coverage errors in the response variable time series, which errors often tend 
to be positively correlated over time. 


12.1 Problems of Autocorrelation 


When the error terms in Ше regression model are positively autocorrelated, the use of 
ordinary least squares procedures has a number of important consequences. We summarize 
these first, and then discuss them in more detail: 


РА 


1. The estimated regression coefficients are still unbiased, but they no longer have the 
minimum variance property and may be quite inefficient. 

2. MSE may seriously underestimate the variance of the error terms. 

3. s{b,} calculated according to ordinary least squares procedures may seriously underes- 
timate the true standard deviation of the estimated regression coefficient. 


481 


482 PartTwo Multiple Linear Regression 


TABLE 12.1 
Example of 
Positively 
Autocorrelated 
Error Terms. 


4. Confidence intervals and tests using the т and F distributions, discussed earlier, аге ng 
longer strictly applicable. 


To illustrate these problems intuitively, we consider the simple linear regression mode] 
with time series data: 


Y, = fo + В.Х, + ё, 


Here, Y, and X, are observations for period т. Let us assume that the error terms Е, аге 
positively autocorrelated as follows: 


£j = €, + ц 


The и,, called disturbances, аге independent normal random variables. Thus, апу с: гог term 
8, is the sum of the previous error term €,_; and a new disturbance term и. We shall assume 
here that the и, have mean 0 and variance 1. 

In Table 12.1, column 1, we show 10 random observations on the normal variable DA 
with mean O and variance 1, obtained from a standard normal random numbers generator 
Suppose now that £o = 3.0; we obtain then: 


El = £9 + iii =30+.5=3.5 
£3 = 6; +u = 3.5 – 7 = 2.8 


etc. 


The error terms &; are shown in Table 12.1, column 2, and they are plotted in Figure 12.1, 
Note the systematic pattern in these error terms. Their positive relation over time is shown 
by the fact that adjacent error terms tend to be of the same sign and magnitude. 

Suppose that X, in the regression model represents time, such that X, = 1, X5 —2, 
etc. Further, suppose we know that Во = 2 and f, = .5 so that the true regression func- 
tion is E(Y) = 2+ .5X. The observed Y values based on the error terms in column 2 
of Table 12.1 are shown in column 3. For example, Yo = 2 + .5(0) + 3.0 = 5.0, and 
Y, — 2 + .5(0) + 3.5 = 6.0. Figure 12.2a on page 483 contains the true regression line 
E(Y) — 2 +.5X and the observed Y values shown in Table 12.1, column 3. Figure 122b 
contains the estimated regression line, fitted by ordinary least squares methods. and repeats 


0) (2 (3) 

t и E1 + Ut = Et ¥,=2+.5Xi+ є 
0 — 3.0 5.0 
1 .5 30+ .5 = 3.5 6.0 
2 —.7 3.5— 7= 28 5.8 
3 3 28+ .3— 3.1 6.6 
4 0 31+ 0= 31 7.1 
5 —2.3 3.1-23= 8 5.3 
6 ~1.9 .8—1.9— —1.1 3.9 
7 2 —11+ 2= —.9 4.6 
8 —.3 = .9— .3=—1.2 4.8 
9 .2 —1.2+ 2=~1.0 5.5 
10 —.1 —1.0— .1—-1.1 5.9 


Chapter 12 Autocorrelation in Time Series Data 483 


Error Term 


FIGURE 12.2 Regression with Positively Autocorrelated Error Terms. 


0 


(а) True Regression Line and Observation (b) Fitted Regression Line and Observations 


1 


when so = 3 when вс = 3 


f = 5.85 — .070X 
e ө 


23 4 5 6 7 8 910 X O0 12 3 4 5 6 7 8 9 10 X 


(c) Fitted Regression Line and Observations with 
£g = —.2 and Different Disturbances 


f = .200 + .779X 


о 1 2 34 5 6 7 8 9 10 x s 


the observed Y values. Notice that the fitted regression line differs sharply from the true 
regression line because the initial £y value was large and the succeeding positively autocor- 
related error terms tended to be large for some time. This persistency pattern in the positively 
autocorrelated error terms leads to a fitted regression line far from the true one. Had the 
initial £ọ value been small, say, £j = —.2, and the disturbances different, a sharply different 


484 PartTwo Multiple Linear Regression 


fitted regression line might have been obtained because of the persistency pattern, as shown 
in Figure 12.2c. This variation from sample to sample in the fitted regression lines due tg 
the positively autocorrelated error terms may be so substantial as to lead to large variances 
of the estimated regression coefficients when ordinary least squares methods are useg. 

Another key problem with applying ordinary least squares methods when the error terms 
are positively autocorrelated, as mentioned before, is that MSE may seriously underestimate 
the variance of the е,. Figure 12.2 makes this clear. Note that the variability of the y 
values around the fitted regression line in Figure 12.2b is substantially smaller than the 
variability of the Y values around the true regression line in Figure 12.2a. This is one of 
the factors leading to an indication of greater precision of the regression coefficients thay i 
actually the case when ordinary least squares methods are used in the presence of Positively 
autocorrelated errors. 

In view of the seriousness of the problems created by autocorrelated errors, it is iMportant 
that their presence be detected. A plot of residuals against time is an effective, though 
subjective, means of detecting autocorrelated errors. Formal statistical tests have also been 
developed. A widely used test is based on the first-order autoregréssive error model, which 
we take up next. This model is a simple one, yet experience suggests that it is frequently 
applicable in business and economics when the error terms are serially correlated. 


12.2 First-Order Autoregressive Error Model 


Simple Linear Regression 
The generalized simple linear regression model for one predictor variable when the random 
error terms follow a first-order autoregressive, or AR(1), process is: 


Y, = Bot В.Х, + ё, 


12.1 
E€ = pEr t ul аА) 


where: 


p is a parameter such that |p| < 1 
и, аге independent N (0, c?) 


Note that generalized regression model (12.1) is identical to the simple linear regression 
model (2.1) except for the structure of the error terms. Each error term in model (12.1) 
consists of a fraction of the previous error term (when o > 0) plus a new disturbance 
term a. The parameter p is called the autocorrelation parameter. 


Multiple Regression 
The generalized multiple regression model when the random error terms follow a first-order 
autoregressive process is: 
Y, = Bo +В. Ха + УХ +- + Boi Xp + & (122) 
£j = p&r t+ un 


Chapter 12 Autocorrelation in Time Series Data 485 


where: 


lel «1 
u, are independent N (0, o?) 


Thus, we see that generalized multiple regression model (12.2) is identical to the earlier 
multiple regression model (6.7) except for the structure of the error terms. 


Properties of Error Terms 
Regression models (12.1) and (12.2) are generalized regression models because the error 
terms £, in these models are correlated. However, the error terms still have mean zero and 
constant variance: ^ 

Efe} = 0 (12.3) 
2 
с 20, } = 


с 
1-9 » (12.4) 


Note that the variance of the error terms here is à function of the autocorrelation parameter о. 
The covariance between adjacent error terms £; and &,_, is: 


if c? 
o[£,8&. a4] = (т^) (12.5) 
1—p 
The coefficient of correlation between £; and &;..;, denoted by pf{é;, & 1}, is defined as 
follows: : 
o [&, £j. 
ieee (12.6) 


o{&,}o{&—1} 


Since the variance of each error term according to (12.4) is с2/(1 — p°), the coefficient of 
correlation using (12.5) is: 


PLE, &-i) =p 
о? о? 
|, — p? ТА — p? 


Thus, the autocorrelation parameter p is the coefficient of correlation between adjacent 
error terms. 
The covariance between error terms that are s periods apart can be shown to be: 


(12.6a) 


G2 
Of, Ers} = e| —— s #0 (12.7) 
1 — p? : 
and is called the autocovariance function. The coefficient of correlation between £; and &_; 
therefore is: А 
ple, 8-5) = p 5.20 > (12.8) 


Note that (12.8) is called the autocorrelation function. Thus, when p is positive, all error 
terms are correlated, but the further apart they are, the less is the correlation between them. 
The only time the error terms for the autoregressive error models (12.1) and (12.2) are 
uncorrelated is when p = 0. 


486 Part Two 


Multiple Linear Regressiou 


From the results for the variances and covariances of the error terms in (12.4) ang (12 


| е : Л 
we can now state the variance-covariance matrix of the error terms for the first 


$ “OF 
autoregressive generalized regression models (12.1) and (12.2): der 
K кр кр? їз кр"! 
кр к KD o крт? 
c?(s] = š 5 > E (12 
пхп s к 5 ы .9) 
кр"! кр"? кр" к 
where: 
2 
o? 
Кылыг 0? (1293) 


Note again that the variance-covariance matrix (12.9) reflects the generalized nature of 
regression models (12.1) and (12.2) by containing nonzero covariance terms. 


5 


Comments 


1. It is instructive to expand the definition of the first-order autoregressive error term e: 
6 = pêri +H, 


Since this definition holds for all t, we have ё, = рё, 2 + и, 1. When we substitute this expression 
above, we obtain: 


Er = р(р& э + uii) +h = Pera + pui + us 
Replacing now &j by p£&j3 + 1—2, we obtain: 
2, = pH a phu a + pua + us 


Continuing in this fashion, we find: 


& = }ў pus (12.10) 
=0 


Thus, the error term e, in period t is a linear combination of the current and preceding disturbance 
terms. When 0 « p < 1, (12.10) indicates that the further the period t — s is in the past, the smaller 
is the weight of disturbance term и, in determining £;. 

2. The derivation of (12.3). that the error terms have expectation zero, follows directly from taking 
the expectation of e, in (12.10) and using the fact that E {u,} = 0 for all t according to models (12.1) 
and (12.2). 

3. To derive the variance of the error terms in (12.4), we utilize the assumption of models (12.1) 
and (12.2) that the и, are independent with variance c, It then follows from (12.10) that: 


oer} = У оош...) = о? `> p” 
s=0 s=0 


Now for |р] < H it is known that: 


12.3 


Chapter 12 Autocorrelation in Time Series Data 487 


Hence, we have: 


2 


2 
Ey; = 
o {Er} 1— p? 


4. To derive the covariance of €, and &,.., in (12.5), we need to recognize that: 
о3,} = Е{2} 
OÍ&, £i) = E(e&.i] 


These results follow from (A.15a) and (A.21a), respectively, since E(s;) = 0 by (12.3) for all f. 
By (12.10), we have: 


E(&&. 4) = Е{(и, + рил + Qu; 2 + - < (ra + pui + риз +: 5)) 
which can be rewritten: ) 
E(&t& 1) = Е{[и, + р(и 1 + pui a + -+ MD + eu + phus +- 
= Е{и,(ш-1 + рщ—-2 + р?и,—3 To + Eflum + ошо + phu, a Tee 3) 


Since Е (и;и, 5} = 0 for all s 4 0 by the assumed independence of the и, and the fact that E(u;] = 0 
for all t, the first term drops out and we obtain: 


E(e;&-,) = pE{€;_,} = 00^ (61) 


Hence, by (12.4), which holds for all t, we have: 


о? 
o {Ers 8-1) = p (2) 


5. The first-order autoregressive error process іп models (12.1) and (12.2) is the simplest kind. A 
second-order process would be: 


& = pi&i  po&i-2 + Uy (12.11) 


Still higher-order processes could be postulated. Specialized approaches have been developed for 
complex autoregressive error processes. These are discussed in treatments of time series procedures 
and forecasting, such as in Reference 12.1. E 


Durbin-Watson Test for Autocorrelation 


The Durbin-Watson test for autocorrelation assumes the first-order autoregressive error 
models (12.1) or (12.2), with the values of the predictor variable(s) fixed. The test consists 
of determining whether or not the autocorrelation parameter p in (12.1) or (12.2) is zero. 
Note that if р = 0, then £ = и,. Hence, the error terms £, are independent when p = 0 
since the disturbance terms u, are independent. 

Because correlated error terms in business and economic applications tend to show 
positive serial correlation, the usual test alternatives considered are: 


Hy: p —0 


12.12 
H,;p-0 ( ) 


488 Part Two Multiple Linear Regression 


Example 


The Durbin-Watson test statistic D is obtained by using ordinary least squares to fit th 
A * Н ` А е 
regression function, calculating the ordinary residuals: 


€; — Y, = if (12.13) 
and then calculating the statistic: 


73. 
уу .(е — ey 


n 


D 


(12.14) 


where n is the number of cases. 

Exact critical values are difficult to obtain, but Durbin and Watson have obtained lower 
and upper bounds d; and dy such that a value of D outside these bounds leads to a definite 
decision. The decision rule for testing between the alternatives in (12.12) is: 


If D > dy, conclude Hy 
If D < d,, conclude H, (1 2.15) 


If dj, < D x dy, the test is inconclusive 


Small values of D lead to the conclusion that р > 0 because the adjacent error terms & 
and £,- , tend to be of the same magnitude when they are positively autocorrelated. Hence, 
the differences in the residuals, e; — е1, would tend to be small when р > 0, leadine to a: 
small numerator in D and hence to a small test statistic D. 

Table B.7 contains the bounds d, and dy for various sample sizes (п), for two levels of: 
significance (.05 and .01), and for various numbers of X variables (p — 1) in the regression. 
model. 


The Blaisdell Company wished to predict its sales by using industry sales as a predictor 
variable. (Accurate predictions of industry sales are available from the industry's trade 
association.) A portion of the seasonally adjusted quarterly data on company sales and. 
industry sales for the period 1998—2002 is shown in Table 12.2, columns | and 2. A scatter 
plot (not shown) suggested that a linear regression model is appropriate. The market research; 
analyst was, however, concerned whether or not the error terms are positively autocorrelated.. 

The results of using ordinary least squares to fit a regression line to the data in Table 122 
are shown at the bottom of Table 12.2. The residuals e, are shown in column 3 of Table 12.2 
and are plotted against time in Figure 12.3. Note how the residuals consistently are above 
or below the zero line for extended periods. Positive autocorrelation in the error terms is 
suggested by such a pattern when an appropriate regression function has been employed. | 

The analyst wished to confirm this graphic diagnosis by using the Durbin- Watson test 
for the alternatives: 


Hy: р= 0 
Ay: о > 0 
Columns 4, 5, and 6 of Table 12.2 contain the necessary calculations for the test statistic 
D. The analyst then obtained: 


322 (е = ea» = 09794 = 
О 113330 7 


piu 


D= 735 


TABLE 
(Company 


Chapter 12 Autocorrelation in Time Series Data 489 


12.2 Data, Regression Results, and Durbin-Watson Test Calculations—Blaisdell Company Example 
and Industry Sales Data Are Seasonally Adjusted). 


(1) (2) (3) (4) (5) (6) 
Сотрапу Industry 5 
Sales Sales 
($ millions) ($ millions) Residual 
Y X, е е ел (е e)? е? 
20.96 127.3 —.026052 — — .0006787 
21.40. 130.0 —.062015 | —.035963 .001 2933 .0038459 
21.96 132.7. .022021 .084036 .0070620 .0004849 
21.52 129.4 л 63754 141733 „0200882 0268154 
27.52 164.2 .029112  —.076990 .0059275 .0008475 
27.78 165.6 .042316 .013204 .0001743 .0017906 
28.24 168.7 —.044160 . —.086476 .0074781 90019501 
28.78 171.7 —.033009 .011151 .0001243 .0010896 


0979400 .1333018 
f = —1.4548 + .17628X 
s{bo} = .21415 — s(bi) = .00144 
MSE = .00741 


FIGURE 12.3 
Residuals 
Plotted against 
Time— 
Blaisdell 
Company 
Example. 


0.2 
LJ 
0.1 °? 
E: e °°? АС 
5 ooe < 
£ ө ө ө .? 
—0.1 e 
e o 
SS 4 8 12 16 20 
Time 


For level of significance of .01, we find in Table B.7 for n = 20 and p — 1 = 1: 
d; = .95 dy = 1.15. 


Since D — .735 falls below d;, — .95, decision rule (12.15) indicates that the appropriate 
conclusion is H,, namely, that the error terms are positively autocorrelated. 


Comments i 


1. If a test for negative autocorrelation is required, the test statistic to be used is 4 — D, where D 
is defined as above. The test is then conducted in the same manner described for testing for positive 
autocorrelation. That is, if the quantity 4 — D falls below dz, we conclude p < 0, that negative auto- 
correlation exists. and so on. 


490 PartTwo Multiple Linear Regression 


2. A two-sided test for Hy: p = О versus Ha: p Æ О can be made by employing both one-sided 
tests separately. The Туре 1 risk with the two-sided test is 2a, where о is the Type 1 risk for each 
one-sided test. 

3. When the Durbin-Watson test emptoying the bounds d; and dy gives indeterminate results, in 
principle more cases are required. Of course, with time series data it may be impossible to Obtain 
more cases, or additional cases may lie in the future and be obtainable only with great delay. Durbin 
and Watson (Ref. 12.2) do give an approximate test which may be used when the bounds test is 
indeterminate. but the degrees of freedom should be larger than about 40 before this approximate test 
will give more than a rough indication of whether autocorrelation exists. 

A reasonable procedure is to treat indeterminate results as suggesting the presence of autocorrelateg 
errors and employ one of the remedial actions to be discussed next. When remedial action does not 
lead to substantially different regression results as ordinary least squares, the assumption of unenrra, 
lated error terms would appear to be satisfactory. When the remedial action does lead to substantially 
different regression results (such as larger estimated standard errors for the regression coefficients 
or the elimination of autocorrelated errors), the results obtained by means of the remedial action аге 
probably the more useful ones. 

4. The Durbin-Watson test is not robust against misspecifications of the model. For example, the 
Durbin-Watson test may not disclose the presence of autocorrelated errors that follow the second-order 
autoregressive pattern in (12.11). 

5. The Durbin-Watson test is widely used; however, other tests for autocorrelation are available, 
One such test, due to Theil and Nagar, is found in Reference 12.3. [| 


12.4 Remedial Measures for Autocorrelation 


The two principal remedial measures when autocorrelated error terms are present are to add 
one or more predictor variables to the regression model or to use transformed variables, 


Addition of Predictor Variables 


As noted earlier, one major cause of autocorrelated error terms is the omission from the 
model of one or more key predictor variables that have time-ordered effects on the response 
variable. When autocorrelated error terms are found to be present, the first remedial action 
should always be to search for missing key predictor variables. In an earlier illustration, we 
mentioned population size as a missing variable in a regression of annual sales of a product 
on average yearly price of the product during a 30-year period. 

When the long-term persistent effects in a response variable cannot be captured by one 
or several predictor variables, a trend component can be added to the regression model, such 
as a linear trend or an exponential trend. Use of indicator variables for seasonal effects, as 
discussed on pages 319—321, can be helpful in eliminating or reducing autocorrelation in 
the error terms when the response variable is subject to seasonal effects (e.g., quarterly sales 
data). 


Use of Transformed Variables 
Only when use of additional predictor variables is not helpful in eliminating the problem of 
autocorrelated errors should a remedial action based on transformed variables be employed. 
A number of remedial procedures that rely on transformations of the variables have been 
developed. We shall explain three of these methods. Our explanation will be in terms of 
simple linear regression, but the extension to multiple regression is direct. 


Chapter 12 Autocorrelation in Time Series Data 491 


The three methods to be described are each based on an interesting property of the 
first-order autoregressive error term regression model (12.1). Consider the transformed 
dependent variable: 


Y, = Y, — pYi.4 


Substituting in this expression for Y, and Y, , according to regression model (12.1), 
we obtain: 


Ү, = (Bo + В.Х, + £1) — P (Bo + В.Х, + £3) 
= &( — p) + BX, — eX.) + (€ — per) 
But, by (12.1), e£, — ре; = ur. Hence: 
Y; = Bo(1 — p) + Pi (X: — PX) + ui (12.16) 


where the u, are the independent disturbance terms. Thus, when we use the transformed 
variable У”, the regression model contains error terms that are independent. Further, model 
(12.16) is still a simple linear regression model with new X variable X; = X, — pX;-;, as 
may be seen by rewriting (12.16) as follows: 


E 


7 Y; = В+ BX, +u: (12.17) 
where: 
Y, = Y, — рў, 
X, = X,— pX,- 
Во = Во(1 — p) 
В = В 


Hence, by use of the transformed variables X, and Y;, we obtain a standard simple linear 
regression model with independent error terms. This means that ordinary least squares 
methods have their usual optimum properties with this model. 

In order to be able to use the transformed model (12.17), one generally needs to estimate 
the autocorrelation parameter p since its value is usually unknown. The three methods to 
be described differ in how this is done. Often, however, the results obtained with the three 
methods are quite similar. 

Once an estimate of p has been obtained, to be denoted by r, transformed variables are 
obtained using this estimate of p: 


Y, = Y, — rY, (12.18a) 
Х,=Х,—тХ, > (12.18b) 
Regression model (12.17) is then fitted to these transformed data, yielding an estimated 
regression function: А 
ў =b +X - - (12.19) 
If this fitted regression function has eliminated the autocorrelation in the error terms, we 
can transform back to a fitted regression model in the original variables as follows: 


Ê —byd4bX (12.20) 


492 PartTwo Multiple Linear Regression 
where: 
= 0 
by = ; (1 2. 20 а) 


dms i 
Иш (12.205) 
The estimated standard deviations of the regression coefficients for the original Variables 
can be obtained from those for the regression coefficients for the transformed variables as 


follows: 
515} 
sbo = 1 (12.21) 
s(bi) = sibi) (12215) 


Cochrane-Orcutt Procedure 
The Cochrane-Orcutt procedure involves an iteration of three steps. 


1. Estimation of p. This is accomplished by noting that the autoregressive error process 
assumed in model (12.1) can be viewed as a regression through the origin: 


& = p&r + ш, 


where e, is the response variable, &,.., the predictor variable, и, the error term, and p the slope 
of the line through the origin. Since the £, and &;.., are unknown, we use the residuals e, and 
€; .4 obtained by ordinary least squares as the response and predictor variables, and estimate 
p by fitting a straight line through the origin. From our previous discussion of regression 
through the origin, we know by (4.14) that the estimate of the slope p, denoted by r, is: 


г = шї: (1222) 


2. Fitting of transformed model (12.17). Using the estimate r in (12.22), we next obtain 
the transformed variables Y; and X; in (12.18) and use ordinary least squares with these 
transformed variables to yield the fitted regression function (12.19). 

3. Test for need to iterate. The Durbin-Watson test is then employed to test whether the 
error terms for the transformed model are uncorrelated. If the test indicates that they are 
uncorrelated, the procedure terminates. The fitted regression model in the original variables 
is then obtained by transforming the regression coefficients back according to (12.20). 


If the Durbin-Watson test indicates that autocorrelation is still present after the first iter 
ation, the parameter p is reestimated from the new residuals for the fitted regression model 
(12.20) with the original variables, which was derived from the fitted regression model 
(12.19) with the transformed variables. A new set of transformed variables is then obtained 
with the new ғ. This process may be continued for another iteration or two until the Durbin- 
Watson test suggests that the error terms in the transformed model are uncorrelated. ff 
the process does not terminate after one or two iterations, a different procedure should be 
employed. 


For the Blaisdell Company example, the necessary calculations for estimating the autocor 
relation parameter p, based on the residuals obtained with ordinary least squares applied 
to the original variables, are illustrated in Table 12.3. Column 1 repeats the residuals from 


Example 


КИЕ 


TABLE 12.3 
Calculations 
for Estimating 
p vith the 
Cóchrone- 
Orcutt 
procedure— 
Blaisdell 
Company 
Example. 


TABLE 12.4 
Transformed 
Variables and 
Regression 
Results for 
First Iteration 
with Cochrane- 
Orcutt 
Procedure— 
Blaisdell 
Company 
Example. 


Chapter 12 Autocorrelation in Time Series Data 493 


(1) p (3) (4) 
t е La еле. е2 1 
1 ~.026052 a == = 
2 —.062015 —.026052 .0016156 .0006787 
3 .022021 —.062015 —.0013656 .0038459 
4 163754 .022021 0036060 0004849 
vis AME z sies ies ie 
17 029112 106102 .0030889 .0112576 
18 ‚042316 029112 0012319 0008475 
19 —.044160 042316 —.0018687 .0017906 
20 — 033009 —.044160 .0014577 .0019501 
Total .0834478 1322122 
Е Уе бле _ -0834478 
= Sel, = 1325125 eN 
(1) (2) (3) (4) 
t Y, X Y, = Y, EE -631166Y,_; xi = Xt mc .631166 X, 1 
1 2096 1273 = _ 
2 21.40 130.0 8.1708 49.653 
3. 2196 1327 8.4530 50.648 
4 2152 1294 7.6596 45.644 
17 27.52 1642 10.4911 62.772 
18 27.78 165.6 10.4103 61.963 
19 2824 1687 10.7062 64.179 
20 2878 1717 10.9559 65.222 


f! = —3941 + .17376X’ 
s{bo} = .1672 s(b1] = 002957 
MSE = .00451 


Table 12.2. Column 2 contains the residuals e,_,, and columns З and 4 contain the necessary 
calculations. Hence, we estimate: 


oe 0834478 
~ 1322122 
We now obtain the transformed variables Y; arid X; in (12.18): 
Y; = Y, — .631166Y,_; 
X; = X, — .631166X,., 


— .631166 


> 


These are found in Table 12.4. Columns 1 and 2 repeat the original variables Y, and X,, 
and columns 3 and 4 contain the transformed variables Y/ and X;. Ordinary least squares 
fitting of linear regression is now used with these transformed variables based on the n — 1 


494 PartTwo Multiple Linear Regression 


cases remaining after the transformations. The fitted regression line and other regression 
results are shown at the bottom of Table 12.4. The fitted regression line in the transformed 
variables is: 


P’ = ~.3941 + .17376Х' (12,23) 
where: 

Y; = Y, — .631166Y,.., 

X, = X, — .631166X,., 


Since the random term in the transformed regression model (12.17) is the disturbance 
term u,, MSE = .00451 is an estimate of the variance of this disturbance term; recall that 
ош} =o. 

From the fitted regression function for the transformed variables in (12.23), residuals 
were obtained and the Durbin- Watson statistic calculated. The result was (calculations nor 
shown) Р = 1.65. From Table B.7, we find foro = .01, р – 1 = l, and n = 19; 


d, = .93 dy — 1.13 


Since D — 1.65 > dy = 1.13, we conclude that the autocorrelation coefficient for the error 
terms in the model with the transformed variables is zero. 
Having successfully handled the problem of autocorrelated error terms, we now transform 
the fitted model in (12.23) back to the original variables, using (12.20): 
E b, = —.3941 Е 
1—r 1.631166 
b, = b, = .17376 


— 1.0685 


leading to the fitted regression function in the original variables: 
Ў = —1.0685 + .17376X (12.24) 
Finally, we obtain the estimated standard deviations of the regression coefficients for the 
original variables by using (12.21). From the results in Table 12.4, we find: 
{Бо} _ 1672. 
|=  !-—.631166 
{Р} = s{b\} = .002957 


= .45332 


S{bo} = 


Comments 

I. The Cochrane-Orcutt approach does not always work properly. A major reason is that when 
the error terms are positively autocorrelated, the estimate r in (12.22) tends to underestimate the 
autocorrelation parameter p. When this bias is serious, it can significantly reduce the effectiveness of 
the Cochrane-Orcutt approach. 

2. "There exists an approximate relation between the Durbin-Watson test statistic D in (12.14) and 
the estimated autocorrelation parameter r in (12.22): 


D &2(1—r) (1225) 


This relation indicates that the Durbin-Watson statistic ranges approximately between 0 and 4 
since r takes on values between —1 and 1, and that D is approximately 2 when r = 0. Note that 


J 


Chapter 12 Autocorrelation in Time Series Data 495 


for the Blaisdell Company example ordinary least squares regression fit, D = .735, ғ —.631, and 
2(1 — r) = .738. 

3. Under certain circumstances, it may be helpful to construct pseudotransformed values for period 
1 so that the regression for the transformed variables is based on n, rather than n — 1, cases. Procedures 
for doing this are discussed in specialized texts such as Reference 12.4. 

4. The least squares properties of the residuals, such as that the sum of the residuals is zero, apply 
to the residuals for the fitted regression function with the transformed variables, not to the residuals 
for the fitted regression function transformed back to the original variables. E 


Hildreth-Lu Procedure 


Example 


TABLE 12.5 


Company 
Example. 


The Hildreth-Lu procedure for estimating Ше autocorrelation parameter o for use in the 
transformations (12.18) is analogous to the Box-Cox procedure for estimating the param- 
eter А in the power transformation of Y to improve the appropriateness of the standard 
regression model. The value of p chosen with the Hildreth-Lu procedure is the one that 
minimizes the error sum of squares for the transformed regression model (12.17): 


SSE = XY; - 2? = у 0 - b — bX)? (12.26) 


Computer programs are available to find the value of p that minimizes SSE. Alternatively, 
one can do a numerical search, running repeated regressions with different values of p for 
identifying the approximate magnitude of o that minimizes SSE. In the region of p that 
leads to minimum SSE, a finer search can be conducted to obtain a more precise value of p. 

Once the value of p that minimizes SSE is found, the fitted regression function cor- 
responding to that value of p is examined to see if the transformation has successfully 
eliminated the autocorrelation. If so, the fitted regression function in the original variables 
can then be obtained by means of (12.20). 


Table 12.5 contains the regression results for the Hildreth-Lu procedure when fitting the 
transformed regression model (12.17) to the Blaisdell Company data for different values 
of the autocorrelation parameter p. Note that SSE is minimized when p is near .96, so we 
shall let r — .96 be the estimate of p. The fitted regression function for the transformed 
variables corresponding to r — .96 and other regression results are given at the bottom of 
Table 12.5. 'The fitted regression function in the transformed variables is: 


Ê’ = 07117 + .16045X' (12.27) 
p SSE p SSE 
ло 1170 94 ‚0718 
30 0938 95 ‚07171 
50 ‚0805 96 :07167 _ 
70. .0758 97 07175 
90 0728 98 ‚07197 


92 .0723 
For p= .96: 1" =.07117+ .16045 X 
s(b5) = .05798 50; } = .006840 

MSE = .00422 


496 PartTwo Multiple Linear Regression 


where: 


Y! = Y, — 96r, , 
X! = X, — 96X,., 


The Durbin-Watson test statistic for this fitted model is D = 1.73. Since for p = 19, 
р— 1 = 1, ando = .01 the upper critical value is dy = 1.13, we conclude that no ашосоу. 
relation remains in the transformed model. 

Therefore, we shall transform regression function (12.27) back to the original variables, 
Using (12.20), we obtain: 


Ў = 1.7793 + .16045Х (12.28) 
The estimated standard deviations of these regression coefficients are: 


s{bo} = 1.450 s{b,} = .006840 
Comments И 

1. The Hildreth-Lu procedure, unlike the Cochrane-Orcutt procedure, does not require any itera- 
tions once the estimate of the autocorrelation parameter p is obtained. 

2. Note from Table 12.5 that SSE as a function of p is quite stable im a wide region around the 
minimum, as is often the case. lt indicates that the numerical search for finding the best value of p 
need not be too fine unless there is particular interest in the intercept term £p, since the estimate by is 
sensitive to the value of r. [| 


First Differences Procedure 
Since the autocorrelation parameter p is frequently large and SSE as a function of p often 
is quite flat for large values of p up to 1.0, as in the Blaisdell Company example, some 
economists and statisticians have suggested use of p = 1.0 in the transformed model (12.17). 
If o = 1, Во = (1 — p) = 0. and the transformed model (12.17) becomes: 


Y; = ВХ, + u (12.29) 
where: 

Y, = Y, — Yı (12.29a) 

Х,= Х,— Xa (12.2%) 


Thus, again, the regression coefficient В; = f, can be directly estimated by ordinary least 
squares methods, this time based on regression through the origin. Note that the transformed 
variables in (12.292) and (12.296) are ordinary first differences. It has been found that this 
first differences approach is effective in a variety of applications in reducing the autocorre- 
lations of the error terms, and of course it is much simpler than the Cochrane-Oreutt and 
Hildreth-Lu procedures. 

The fitted regression function in the transformed variables: 


Ў = bx’ (12.30) 
can be transformed back to the original variables as follows: 


Y = by F b,X (1231) 


me 
Example 
Example __ 


TABLE 12.6 
First 
Differences and 
Regression 
Results with 
Differences 
Procedure— 
Blaisdell 
Company 
Example. 


Chapter 12 Autocorrelation in Time Series Data 497 
where: 


b, — b, 


(12.312) 
(12.31b) 


Table 12.6 illustrates the transformed variables Y/ and X;, based on the first differences 
transformations in (12.292, b) for the Blaisdell Company example. Application of ordinary 
least squares for estimating a linear regression through the origin leads to the results shown 
at the bottom of Table 12.6. The fitted regression function in the transformed variables is: 


Y' = .16849X' (12.32) 
where: Í 

Y; =Y, — Yı 

X! = X, — Х,-\ 


ү 


To examine whether the first differences procedure has removed the autocorrelations, 
we shall use the Durbin-Watson test. There are two points to note when using the Durbin- 
Watson test with the first differences procedure. Sometimes the first differences procedure 
can overcorrect, leading to negative autocorrelations in the error terms. Hence, it may be 
appropriate to use a two-sided Durbin-Watson test when testing for autocorrelation with 
first differences data. The second point is that the first differences model (12.29) has no 
intercept term, but the Durbin-Watson test requires a fitted regression with an intercept 
term. A valid test for autocorrelation in a no-intercept model can be carried out by fitting for 
this purpose a regression function with an intercept term. Of course, the fitted no-intercept 
model is still the model of basic interest. 

In the Blaisdell Company example, the Durbin-Watson statistic for the fitted first dif- 
ferences regression model with an intercept term is D — 1.75. This indicates uncorrelated 
error terms for either a one-sided test (with œ = .01) or a two-sided test (with œ = .02). 

With the first differences procedure successfully eliminating the autocorrelation, we 
return to a fitted model in the original variables by using (12.31): 


^ 


= —.30349 + .16849X (12.33) 
(1) (2) (3) (4) 

t Y, Xt LA = Y,— Ү, 1 X = Xt Xr 

1 20.96 127.3 — m 

2 21.40 130.0 .44 2.7 

3 21.96 132.7 56 à 2.7 

4 21.52 129.4 —.44 M —3.3 
17 27.52 164.2 .54 3.5 " 
18 27.78 165.6 .26 1.4 E 
19 28.24 168.7 46 s 3.1 
20 28.78 171.7 54 . 3.0 

f' = .16849 X' 
s(b1) = -005096 MSE = .00482 


498 PartTwo Multiple Linear Regression 


TABLE 12.7 
Major 
Regression 
Results for 
Three Trans- 
formation 
Procedures— 
Blaisdell 
Company 
Example. 


Comparison 


Estimate of c? 


Procedure bi s{bı} r (MSE) 
Cochrane-Orcutt .1738 .0030 .63 .0045 
Hildreth-Lu .1605 .0068 .96 .0042 
First differences .1685 .0051 1.0 .0048 


Ordinary least squares .1763 .0014 — = 


where: 
by = 24.569 — .16849(147.62) = —.30349 


We know from Table 12.6 that the estimated standard deviation of by is s(b,) = .005096 
since by = bj. 


Ф 


of Three Methods 

Table 12.7 contains some of the main regression results for the three transformation methods 
and also for the ordinary least squares regression fit to the original variables. A number of 
key points stand out: 


— 


All of the estimates of В; are quite close to each other. 

2. The estimated standard deviations of b; based on Hildreth-Lu and first differences trans- 
formation methods are quite close to each other; that with the Cochrane-Orcutt proce 
dure is somewhat smaller. The estimated standard deviation of b, based on ordinary 
least squares regression with the original variables is still smaller. This is as expected, 
since we noted earlier that the estimated standard deviations s(5,] calculated according 
to ordinary least squares may seriously underestimate the true standard deviations o {by} 
when positive autocorrelation is present. 

3. All three transformation methods provide essentially the same estimate of o?, the vari- 

ance of the disturbance terms z. 


The three transformation methods do not always work equally well, as happens to bethe 
case here for the Blaisdell Company example. The Cochrane-Orcutt procedure may fail to 
remove autocorrelation in one or two iterations, in which case the Hildreth-Lu or the first 
differences procedures may be preferable. When several of the transformation methods are 
effective in removing autocorrelation, then simplicity of calculations may be considered in 
choosing from among these procedures. 


Comment 


Further discussions of the Cochrane-Orcutt, Hildreth-Lu, and first differences procedures, as well as 
of other remedia! procedures for autocorrelated errors, may be found in specialized texts, such as 
Reference 12.4. 


Chapter 12 Autocorrelation in Time Series Data 499 


19.5 Forecasting with Autocorrelated Error Terms 


One important use of autoregressive error regression models is to make forecasts. With these 
models, information about the error term in the most recent period n can be incorporated 
into the forecast for period n + 1. This provides a more accurate forecast because, when 
autoregressive error regression models are appropriate, the error terms in successive periods 
are correlated. Thus, if sales in period n are above their expected value and successive error 
terms are positively correlated, it follows that sales in period n + 1 will likely be above their 
expected value also. 

We shall explain the basic ideas underlying the development of forecasts using the 
presence of autocorrelated error terms by again employing the simple linear autoregressive 
error term regression model (12.1). The extension to multiple regression model (12.2) is 
direct. First, we consider forecasting when either the Cochrane-Orcutt or the Hildreth-Lu 
procedure has been utilized for estimating the regression parameters. 

When we express regression model (12.1): - 


Y, = fo + В.Х, + & 
by using the structure of the error terms: 
É Er = рё, +и, 
we obtain: 
Y, = Bo + В.Х, + pEr + ui 
For period n + 1, we obtain: 
Уһ+ = Bo + В. Хн + 085 + Unyi (12.34) 
Thus, У, +; is made up of three components: 


1. The expected value Во + Bı Xn41. 
2. A multiple p of the preceding error term £ņ. 
3. An independent, random disturbance term with E(u,4,] = 0. 
The forecast for next period n + 1, to be denoted by Ё, +1, is constructed by dealing with 
each of the three components in (12.34): 


1. Given X,,1, we estimate the expected value fo + В, Х„+ as usual from the fitted regres- 
sion function: 
Y, = bo + b Xs 
where Ро and Pb, are the estimated regression coefficients for the original variables 
obtained from Py and b, for the transformed variables according to (12.20). 
2. pis estimated by r in (12.22), and e, is estimated by the residual е„; 
en = Y, — (bo Xs) = Y, — E, 


Thus, рє, is estimated by ге„. 
3. The disturbance term иһ has expected value zero and is independent of earlier infor- 
mation, Hence, we use its expected value of zero in the forecast. 


я 


Thus, the forecast for period n + 1 is: 
Fst = You d res (12.35) 


500 PartTwo Multiple Linear Regression 


Example 


An approximate 1 — o prediction interval for Y, , (44, the new observation оп the 
sponse variable, may be obtained by employing the usual prediction limits for a new ra 
tion in (2.36), but based on the transformed observations. Thus, Y; and X; in formula (2 38 : 
for the estimated variance s? (pred) are replaced by Y/ and X? as defined in (12.1 8. 38а) 

The approximate | — o prediction limits for У, асъ) with simple linear regression 
therefore are: 


Fiz, c f(1—o/2; n — 3)s {pred} (1236) 


where s{pred}, defined in (2.38a), is here based on the transformed observations. Note the 
use of n — 3 degrees of freedom for the г multiple, since there are only n — 1 transformeg 
cases and two degrees of freedom are lost for estimating the two parameters in the simple 
linear regression function. 

When forecasts are based on the first differences procedure, the forecast in (1235) is 
still applicable, but r = 1 now. The estimated standard deviation s{pred} now is calculated 
according to formula (4.20) in Table 4.1 for one predictor variable, using the transformeg 
variables. Finally, the degrees of freedom for the г multiple in (12.36) will be n — 2, since 
only one parameter has to be estimated in the no-intercept regression model (12.29, 


For the Blaisdell Company example, the trade association has projected that deseasonalized 
industry sales in the first quarter of 2003 (i.e., quarter 21) will be X2, = $175.3 million, 
'To forecast Blaisdell Company sales for quarter 21, we shall use the Cochrane-Orcutt fitted 
regression function (12.24): 


f = —1.0685 + .17376X 
First, we need to obtain the residual eo: 
е» = Yao — Yoo = 28.78 — [—1.0685 + .17376(171.7)] = .0139 
The fitted value when X», = 175.3 is: 
Yo, = —1.0685 + .17376(175.3) = 29.392 
The forecast for period 21 then is: 
Fo, = Ya + reo = 29.392 + .631166(.0139) = 29.40 


Note how the fact that company sales in quarter 20 were slightly above their estimated mean 
has a small positive influence on the forecast for company sales for quarter 21. 

We wish to set up a 95 percent prediction interval for У ису. Using the data for the 
transformed variables in Table 12.4, we calculate s (pred) by (2.38) for: 


Xi = Xia — .631166Х, = 175.3 — .631166(171.7) = 66.929 


We obtain s{pred} = .0757 (calculations not shown). We require f(.975; 17) = 2.110. We 
therefore obtain the prediction limits 29.40 + 2.1 10(.0757) and the prediction interval: 


29.24 = Yai (new) < 29.56 


Given quarter 20 seasonally adjusted company sales of $28.78 million and other past sales 
and given quarter 21 industry sales of $175.3 million, we predict with approximately 95 pe 
cent confidence that seasonally adjusted Blaisdell Company sales in quarter 21 will be 
between $29.24 and $29.56 million. 


Chapter 12 Autocorrelation in Time Series Data 501 


To obtain a forecast of actual sales including seasonal effects in quarter 21, the Blaisdell 
Company still needs to incorporate the first quarter seasonal effect into the forecast of 
seasonally adjusted sales. 

The forecasts with the other transformation procedures are very similar to the one with 
the Cochrane-Orcutt procedure. With the first differences estimated regression function 
(12.33), the forecast for quarter 21 is: 


Fy, = [—.30349 + .16849(175.3)] + 1.0[28.78 + .30349 — .16849(171.70)] = 29.39 


The estimated standard deviation sfpred) calculated according to (4.20) with the trans- 
formed data in Table 12.6 is s(pred) = .0718 (calculations not shown). For a 95 percent 
prediction interval, we require 7(.975; 18) = 2.101. The prediction limits therefore are 
29.39 + 2.101(.0718) and the approximate 95 percent prediction intervalis: ` 


29.24 < Yoinew) < 29.54 


This forecast is practically the same as that with the Cochrane-Orcütt estimates. 
The approximate 95 percent prediction interval with the estimated regression func- 
tion (12.28) based on the Hildreth-Lu procedure is (calculations not shown): 


Р 29.24 < Yzimew) < 29.52 


This forecast is practically the same as the other two. 


Comments 


1. Forecasts obtained with autoregressive error regression models (12.1) and (12.2) are conditional 
on the past observations Y,, Y, ;, etc. They are also conditional on X,4;, which often has to Pe 
projected as in the Blaisdell Company example. 

2. Forecasts for two or more periods ahead can also be developed, using the recursive relations of 
є, to earlier error terms developed in Section 12.2. For example, given Х„+2 the forecast for period 
n + 2, based on either Cochrane-Orcutt or Hildreth-Lu estimates, is: -> 


Ею = Ён 4 r?e, (12.37) 


For the first differences estimates, the forecast in (12.37) is calculated with r = 1. 

3. The approximate prediction limits (12.36) assume that the value of r used in the transfor- 
mations (12.18) is the true value of р; that is, к = p. If that is the case, the standard regression 
assumptions apply since we are then dealing with the transformed model (12.17). To see that the 
prediction limits obtained from the transformed model are applicable to the forecast Б, у; in (12.35), 
recall that o?{pred} in (2.37) is the variance of the difference Упер) — ,. In terms of the situation 


here for the transformed variables, we have the following correspondences: 
пе») Corresponds to Y, ,, = 


= by + bi Xari = boll — F) + bi OG — 1) 


= fna 7 P Y, 
Ê, corresponds to ¥’ 


nl 
The difference У’ — Ӯ’, is: 


nl nti у ^ 
Your Y; = (You —rY,) — bo(1 — r) — bi (Xa+ — Xn) " 
= Кы — (bo +b1Xn4i) — r(Y, — bo — b: Xn) 
= fna T ЖИ — ren " 
= ini Р 


Hence, Y, 1 plays the role of Varney) and Р, +; plays the role of Ê, in (2.37). The prediction limits (12.36) 
are approximate because r is only an estimate of p. E 


502 PartTwo Multiple Linear Regression 


Cited 


References 


12.1. 


. Theil, H., and A. L. Nagar. “Testing ihe Independence of Regression Disturbances” Jou, 


Вох, G. E. P.. and С. M. Jenkins. Tine Series Analysis. Forecasting aud Control. Rev. ed. * 
Francisco: Holden-Day. 1976. ` Say’ 


. Durbin. J., and G. S. Watson. “Testing for Serial Correlation in Least Squares Regression I : 
le зу 
» 


Biometrika 38 (1951), рр. 159-78. 


the American Statistical Association 56 (1961), pp. 793-806. Phal of 


. Greene. №. Н. Econometric Analysis, 5th ed. Upper Saddle River. New Jersey: Prentice H at 


2003. 


Problems 


12.2. 


12.4. 


12.5. 


*12.6. 


12.8. 


*12.9. 


. Refer to Table 12.1. 


a. Plot е, against &, , for f = 1l..... 10 on a graph. How is the positive first-order autocor 
relation in the error terms shown by the plot? 

b. If you plotted a, against e,_; for t = 1..... 10, what pattern would you expect? 

Refer to Plastic hardness Problem 1.22. If the same test item were measured at 12 different 

points in time, would the error terms in the regression model likely bé autocorrelated? Discuss. 


3. Astudent stated that the first-order autoregressive error models (12.1) and (12.2) are too Simple: 


for business time series data because the error term in period t in such data is also influenced. 
by random effects that occurred more than one period in the past. Comment. 

A student writing a term paper used ordinary least squares in fitting a simple linear regression} 
model to some time series data containing positively autocorrelated errors, and found that the’ 
90 percent confidence interval for Ву was too wide to be useful. The student then decided to: 
employ regression model (12.1) to improve the precision of the estimate. Comment. 

For each of the following tests concerning the autocorrelation parameter p in regression 
model (12.2) with three predictor variables, state the appropriate decision rule based on the 
Durbin-Watson test statistic for a sample of size 38: (1) Ho: p = 0, Hut p £0, о = 02 
(2) Hy: p = 0, Haz p < 0, œ = .05: (3) Ho: p = 0, Ни: p > 0, а = .01. 

Refer to Copier maintenance Problem 1.20. The observations are listed in time order. Assume; 
that regression model (12.1) is appropriate. Test whether or not positive autocorrelation is 
present: use œ = .01. State the alternatives, decision rule, and conclusion. 


. Referto Grocery retailer Problem 6.9. The observations are listed in time order. Assume that 


regression model (12.2) is appropriate. Test whether or not positive autocorrelation is present; 
use а = .05. State the alternatives, decision rule, and conclusion. 

Refer to Crop yield Problem 11.25. The observations are listed in time order. Assume that 
regression model (12.2) with first- and second-order terms for the two predictor variables and 
no interaction term is appropriate. 'Test whether or not positive autocorrelation is present; use 
a = .Of. State the alternatives, decision rule, and conclusion. 

Microcomputer components. A staff analyst for a manufacturer of microcomputer compo: 
nents has compiled monthly data for the past 16 months on the value of industry production of 
processing units that use these components ( X, in million dollars) and the value of the firm's 
components used (У. in thousand dollars). The analyst believes that a simple finear regression 
relation is appropriate but anticipates positive autocorrelation. The data follow: 


t: 1 2 3 bee 14 15 16 


Xi 2.052 2.026 2.002... 2.080 2.102 2.150 
үг: 102.9 101.5 100.8 .. 104.8 105.0 107.2 


*12.10. 


*12.11. 


*12.12. 


Chapter 12 Autocorrelation in Time Series Data 503 


. Fit a simple linear regression model by ordinary least squares and obtain the residuals. 


Also obtain s{bo} and s(bi). 


. Plot the residuals against time and explain whether you find any evidence of positive 


autocorrelation. 


. Conduct a formal test for positive autocorrelation using œ = .05. State the alternatives, de- 


cision rule, and conclusion. Is the residual analysis in part (b) in accord with the test result? 


Refer to Microcomputer components Problem 12.9. The analyst has decided to employ 
regression model (12.1) and use the Cochrane-Orcutt procedure to fit the model. 


a. 


g. 


Obtain a point estimate of the autocorrelation parameter. How well does the approximate 
relationship (12.25) hold here between this point estimate and the Durbin-Watson test 
statistic? * 


. Use one iteration to obtain the estimates by and Р, of the regression coefficients £ and 


В! in transformed model (12.17) and state the estimated regression function. Also obtain 
s{bo} and s{b;}- 


. Test whether any positive autocorrelation remains after the first iteration using о = .05. 


State the alternatives, decision rule, and conclusion. 


. Restate the estimated regression function obtained in part (b) in terms of the original vari- 


ables. Also obtain s (bo) and {Р}. Compare the estimated regression coefficients obtained 
with the Cochrane-Orcutt procedure and their estimated standard deviations with those 
obtained with ordinary least squares in Problem 12.9a. 


. On the basis of the results in parts (c) and (d), does the Cochrane-Orcutt procedure appear 


to have been effective here? 


. The value of industry production in month 17 will be $2.210 million. Predict the value of 


the firm's components used in month 17; employ a 95 percent prediction interval. Interpret 
your interval. 


Estimate В, with a 95 percent confidence interval. Interpret your interval. 


Refer to Microcomputer components Problem 12.9. Assume that regression model (12.1) 
is applicable. 


a. 


Р. 


Use the Hildreth-Lu procedure to obtain a point estimate of the autocorrelation parameter. 
Do a search at the values p = .1, .2, ..., 1.0 and select from these the value of p that 
minimizes SSE. 


. From your estimate in part (a), obtain an estimate of the transformed regression func- 


tion (12.17). Also obtain s {b} and s{b}}. 


. Test whether any positive autocorrelation remains in the transformed regression model; 


use a = .05. State the alternatives, decision rule, and conclusion. 


. Restate the estimated regression function obtained in part (b) in terms of the original 


variables. Also obtain s{bo} and s{b,}. Compare the estimated regression coefficients 
obtained with the Hildreth-Lu procedure and their estimated standard deviations with 
those obtained with ordinary least squares in Problem 12.9a. 


. Based on the results in parts (c) and (d), has the Hildreth-Lu procedure been effective bere? 
. The value of industry production in month 17 will be $2.210 million. Predict the value of 


the firm's components used in month 17; employ a 95 percent prediction interval Interpret 
your interval. 


Estimate В, with a 95 percent confidence interval. Interpret your interval. 


Refer to Microcomputer components Problem 12.9. Assume that regression model (12.1) 
is applicable and that the first differences procedure is to be employed. 


504 Part Two Multiple Linear Regression 


12.13. 


a. 


f. 


Estimate the regression coefficient f, in the transformed regression model (12 5 
obtain the estimated standard deviation of this estimate. State thc estimated 
function. 


9), and 
regression 


Test whether or not the error terms with the first differences procedure are autocorre 
using a two-sided test and œ = .10. State the alternatives, decision rule. and conc! 
Why is a two-sided tcst meaningful here? 


lated, 
USion, : 


Restate the estimated regression function obtained in part (a) in terms of the Original à 
variables. Also obtain s{b,}. Compare the estimatcd regression coefficients Obtained with 
the first differences procedure and the estimated standard deviation s(bi) with the results 
obtained with ordinary least squares in Problem 12.9a. 

On the basis of the rcsults in parts (b) and (c). has the first differences procedure been 
effective here? 

The value of industry production in month 17 will be $2.210 million. Predict the value of 
the firm's components used in month 17; employ a 95 percent prediction interval, Interpret 
your interval. 


Estimate £; with a 95 percent confidence interval. Interpret your interval. 


Advertising agency. Thc managing partner of an advertising agency is interested in the 
possibility of making accurate predictions of monthly billings. Monthly data on amount of 
billings (Y. in thousands of constant dollars) and on number of hours of staff time (X, in 
thousand hours) for the 20 most recent months follow. A simple linear regression model jg 
believed to be appropriate, but positively autocorrelated error terms may be present, 


t: 1 2 3 РРР 18 19 20 
Хи 2.521 2.171 2.234 ЖГ 3.117 3.623 3.618 
Y 220.4 203.9 207.2 MS 252.4 278.6 278.5 


. Fitasimple linear regression model by ordinary least squares and obtain the residuals. Also 


obtain s{bp} and sb, ). 


. Plot the residuals against time and explain whether you find any evidence of positive 


autocorrelation. 


. Conduct a formal test for positive autocorrelation using o = .01. State the alternatives, 


decision гше, and conclusion. 15 the residual analysis in part (b) in accord with the test 
result? 


. Refer to Advertising agency Problem 12.13. Assume that regression mode! (12.1) is appli- 


cable and that the Cochrane-Orcutt procedure is to be employed. 


a. 


Obtain a point estimate of the autocorrelation parameter. How welt does the approximate 
relationship (12.25) hold here between the point estimate and the Durbin-Watson test 
statistic? 


. Use one iteration to obtain the estimates b; and b; of the regression coefficients Вр and 


В! in transformed model (12.17) and state the estimated regression function. Also obtain 
s{bo} and s{b;}- 

Test whether any positive autocorrelation remains after the first iteration using о = OL 
State the alternatives, decision rule. and conclusion. 


. Restate the estimated regression function obtained in part (b) in terms of the original var 


ables. Also obtain {Бу} and s (bi). Compare the estimated regression coefficients obtained 


12.15. 


12.16. 


12.17. 


g 


Chapter 12 Autocorrelation їп Time Series Data 505 


with the Cochrane-Orcutt procedure and their estimated standard deviations with those 
obtained with ordinary least squares in Problem 12.13a. 

Based on the results in parts (c) and (d), does the Cochrane-Orcutt procedure appear to 
have been effective here? 

Staff time in month 21 is expected to be 3.625 thousand hours. Predict the amount of 
billings in constant dollars for month 21, using a 99 percent prediction interval. Interpret 
your interval. 

Estimate В; with а 99 percent confidence interval. Interpret your interval. 


Refer to Advertising agency Problem 12.13. Assume that regression model (12.1) is 
applicable. 


a. 


Use the Hildreth-Lu procedure to obtain a point estimate of the autocorrelation parameter. 
Do a search at the values p = .1, .2,..., 1.0 and select from these the value of p that 
minimizes SSE. 

Based on your estimate in part (а), obtain an estimate of the transformed regression func- 
tion (12.17). Also obtain s {bg} and {|}. А 

Test whether any positive autocorrelation remains in the transformed regression model; 

use a = .01. State the alternatives, decision rule, and conclusion. 


Restate the estimated regression function obtained in part (b) in terms of the original 
' variables. Also obtain s{bo} and s{b,}. Compare the estimated regression coefficients 


obtained with the Hildreth-Lu procedure and their estimated standard deviations with 
those obtained with ordinary least squares in Problem 12. 13a. 


e. Based on the results in parts (c) and (d), has the Hildreth-Lu procedure been effective here? 


5. 


Staff time in month 21 is expected to be 3.625 thousand hours. Predict the amount of 
billings in constant dollars for month 21, using a 99 percent prediction interval. Interpret 
your interval. 


Estimate В; with а 99 percent confidence interval. Interpret your interval. 


Refer to Advertising agency Problem 12.13. Assume that regression model (12.1) is appli- 
cable and that the first differences procedure is to be employed. 


a. 


f. 


Estimate the regression coefficient В; in the transformed regression model (12.29) and 
obtain the estimated standard deviation of this estimate. State the estimated regression 
function. 

Test whether or not the error terms with the first differences procedure are autocorrelated, 
using a two-sided test and œ = .02. State the alternatives, decision rule, and conclusion, 
Why is a two-sided test meaningful here? 

Restate the estimated regression function obtained in part (a) in terms of the original 
variables. Also obtain s{b;}. Compare the estimated regression coefficients obtained with 
the first differences procedure and the estimated standard deviation s{b,} with the results 
obtained with ordinary least squares in Problem 12.13a. 

Based on the results in parts (b) and (c), has the first differences procedure been effective 
here? 

Staff time in month 21 is expected to be 3.625 thousand hours. Predict the amount of 
billings in constant dollars for month 21, using a 99 percent prediction interval. Interpret 
your interval. 

Estimate В; with a 99 percent confidence interval. Interpret your interval. 


McGill Company sales. The data below show seasonally adjusted quarterly sales for the 
McGill Company (Y, in million dollars) and for the entire industry (X, in million dollars) for 


506 Part Two Multiple Linear Regression 


12.18. 


12.19. 


the most recent 20 quarters. Ў 
t: 1 2 3 Fa 18 19 20 
Xe 127.3 130.0 132.7 ae 165.6 168.7 172.0 ъ 
ү: 20.96 21.40 2196 — ... 27.78 28.24 28.78 f 


a. Would you expect the autocorrelation parameter p to be positive, negative, or Zero highs 
b. Fit a simple linear regression model by ordinary leas! squares and v 
Also obtain s{bo} and s{by}. 

Plot the residuals againsi time and explain whether you find any evidence of posit; 
autocorrelation. 


obtain the residu i 


А 


e 


a 


d. Conduct a formal test for positive autocorrelation using o = .01. State the alternatives | 
decision rule, and conclusion. Is the residual analysis in part (c) in accord with the ho 


result? 3 


3 


Refer to McGill Company sales Problem 12.17. Assume that regression model (12.1) 
applicable and that the Cochrane-Orcutt procedure is to be employed. 


а. Obtain a point estimate of the autocorrelation parameter. How well does the approximate: 
relationship (12.25) hold here between the point estimate and the Durbin-Watson tes 
statistic? 


H 
b. Use one iteration to obtain the estimates by and b} of the regression coefficients В and: 
B in transformed model (12.17) and state the estimated regression function. Also obtain. 
S{by} and {bi}. d 
c. Test whether any positive autocorrelation remains after the first iteration; use о = Ol. 


State the alternatives. decision rule, and conclusion. 

d. Restate the estimated regression function obtained in pan (b) in terms of the origina! 
variables. Also obtain s{bọ} and s{b1}. Compare the estimated regression coefficients 
obtained with the Cochrane-Orcutt procedure and their estimated standard deviations with 
those obtained with ordinary least squares in Problem 12.17b. i 

e. On the basis of the results in parts (c) and (d), does the Cochrane-Orcutt procedure appear 
to have been effective here? 

f. Industry sales for quarter 21 are expected to be $181.0 million. Predict the McGill Company: 
sales for quarter 21, using a 90 percent prediction interval. Interpret your interval. 


g. Estimate f, with a 90 percent confidence interval. Interpret your interval. 


Refer to McGill Company sales Problem 12.17. Assume that regression model! (12.1) is 

applicable. 

a. Use the Hildreth-Lu procedure to obtain a point estimate of the autocorrelation parameter 
Do a search at the values p = .1..2..... 1.0 and select from these the value of p that, 
minimizes SSE. 

b. Based on your estimate in part (a), obtain an estimate of the transformed regression func- 
tion (12.17). Also obtain s{bg} and «(bj j. 

c. Test whether any positive autocorrelation remains in the transformed regression model; 
use œ = .01. State the alternatives, decision rule, and conclusion. 

d. Restate the estimated regression function obtained in part (b) in terms of the original 
variables. Also obtain s{bọ} and s{b,}. Compare the estimated regression coefficients 
obtained with the Hildreth-Lu procedure and their estimated standard deviations with 
those obtained with ordinary least squares in Problem {2.{7b. 


Sd Chapter 12 Autocorrelation in Time Series Data 507 


^ e. Based on the results in parts (c) and (d), has the Hildreth-Lu procedure been effective 
` һеге? 
f. Industry sales for quarter 21 areexpected to be $181.0 million. Predict the McGill Company 
sales for quarter 21, using a 90 percent prediction interval. Interpret your interval. 


g. Estimate В, with a 90 percent confidence interval. Interpret your interval. 


12.20. Refer to McGill Company sales Problem 12.17. Assume that regression model (12.1) is 
applicable and that the first differences procedure is to be employed. 

a. Estimate the regression coefficient £; in the transformed regression model (12.29) and 
obtain the estimated standard deviation of this estimate. State the estimated regression 
function. 

b. Test whether or not the error terms with the first differences procedure are positively 
autocorrelated using œ = .01. State the alternatives, decision rule, and conclusion. . 

c. Restate the estimated regression function obtained in part (a) in terms of the original 
variables. Also obtain s (b). Compare the estimated regression coefficients obtained with 
the first differences procedure and the estimated standard deviation s{b,} with the results 
obtained with ordinary least squares in Problem 12.17b. 

d. On the basis of the results in parts (b) and (c), has the first differences procedure been 
effective here? 

e. Industry sales for quarter21 areexpected to be $181.0 million. Predict the McGill Company 
sales for quarter 21, using a 90 percent prediction interval. Interpret your interval. 

f. Estimate В; with a 90 percent confidence interval. Interpret your interval. 

12.21. A student applying the first differences transformations in (12.29a, b) found that several X; 
values equaled zero but that the corresponding Y; values were nonzero. Does this signify that 
the first differences transformations are not appropriate for the data? 


ee ee CX Cc CDL CO C al 


Exercises 12.22. Derive (12.7) fors = 2. 
12.23. Refer to first-order autoregressive error model (12.1). Suppose Ү, is company's percent share 
of the market, X, is company's selling price as a percent of average competitive selling price, 


Bo = 100, В = —35, р = .6, о? = 1, and & = 2.403. Let X, and и, be as follows for 
=1,..., 10: 
t: 1 2 3 4 5 6 7 8 9 10 
Xe 100 115 120 90 85 75 70 95 105 110 


uy: .764 509 —242  —1.808 —485 .501  —.539  .434 —299  .030 


a. Plot the true regression line. Generate the observations Y, (t = 1, ..., 10), and plot these 
on the same graph. Fit a least squares regresston line to the generated observations Y, and 
plot it also on the same graph. How does your fitted regression line relate to the true line? 

b. Repeat the steps in part (a) but this time let p = 0. In which of the two cases does the fitted 
regression line come closer to the true line? Is this the expected outcome? + 

c. Generate the observations Y, for р = —.7. For each,of the cases p = .6, p = 0, and 
p = —-7, obtain the successive error term differences &; — E; (t = 1,..., 10). 

d. For which of the three cases in part (c) is У )(, — £1)? smallest? For which is it largest? 
What generalization does this suggest? 


508 Part Two Multiple Linear Regression 


12.24, 


12.25. 


12.26. 


узе cm 


For multiple regression model (12.2) with p — | = 2. derive the transformed model in whit 

the random terms are uncorrelated. hich 

Suppose the autoregressive error process for the model Y, = fig + fi X, + e, 

by (12.11). 

a. What would be the transformed variables У, and X, for which the random terms in the, 
regression model are uncorrelated? х 


15 that givers 


b. How would you estimate the paramcters p, and p> for use with the Cochrane Orcutt 
-Oreuti 

procedure? t 
c. How would you estimate the parameters p, and p» with the Hildreth-Lu procedure? 


Derive the forecast Ё, ру for a simple linear regression model with the second-order ay 


А tore: 
gressive error process (12.11). f 


Projects 


12.27. 


The true regression model is Y, = 10 + 24 X, + ё,. where & = .8£, 4 + u, and n, аге inde: 
pendent N (0, 25). * 


à. Generate 11 independent random numbers from N (0, 25). Use the first random number. 
En, obtain the IO error terms £i. . .. . €19, and then calculate the 10 observations Y}, өй 
corresponding to X; = 1. Хэ = 2,.... Xi = 10. Fit a linear regression function by ordi 
nary least squares and calculate MSE, | 

b. Repeat part (a) 100 times, using new random numbers each time. 


c. Calculate the mean of the 100 estimates of b; . Does it appear that b, is an unbiased estimatge 
of f despite the presence of positive autocorrelation? 

d. Calculate the mean of the 100 estimates of MSE. Does it appear that MSE is a biased 
estimator of o7? If so. does the magnitude of the bias appear to be small or large? 


Case 
Studies 


1228. 


12.29. 


Refer to the Website developer data set in Appendix C.6 and Case Study 9.29. The observa 
tions are listed in time order. Using the model developed in Case Study 9.29, test whether or 
not positive autocorrelation is present; use о = .01. If autocorrelation is present. revise the 
model and analysis as needed. 

Refer to the Heating equipment data set in Appendix C.8. The observations are listed in. 
time order. Develop a reasonable predictor model for the monthly heating equipment orders. 
Potential predictors include new homes for sale, current monthly deviation of temperature from 
historical average temperature, the prime lending rate. current distributor inventory levels, the 
amount of distributor sell through, and the level of discounting being offered. Your analysis 
should determine whether or not autocorrelation is present using œ = .05. 1f autocorrelation 
is present, revise the model and analysis as needed. 


Part 


Chapter 


Introduction to Nonlinear 
hegression and Neural 
Networks 


The linear regression models considered up to this point are generally satisfactory approxi- 
mations for most regression applications. There are occasions, however, when an empirically 
indicated or a theoretically justified nonlinear regression model is more appropriate. For 
example, growth from birth to maturity їп human subjects typically is nonlinear in nature, 
characterized by rapid growth shortly after birth, pronounced growth during puberty, and 
a leveling off sometime before adulthood. In another example, dose-response relationships 
tend to be nonlinear with little or no change in response for low dose levels of a drug, fol- 
lowed by rapid S-shaped changes occurring in the more active dose region, and finally with 
dose response leveling off as it reaches a saturated level. We shall consider in this chapter 
and the next some nonlinear regression models, how to obtain estimates of the regression 
parameters in such models, and how to make inferences about these regression parameters. 

In this chapter. we introduce exponential nonlinear regression models and present the 
basic methods of nonlinear regression. We also introduce neural network models, which are 
now widely used in data mining applications. In Chapter 14, we present logistic regression 
models and consider their uses when the response variable is binary or categorical with 
more than two levels. 


19.1 Linear and Nonlinear Regression Models 


Linear Regression Models 


510 


In previous chapters, we considered linear regression models, i.e., models that are linear in 
the parameters. Such models сап be represented by the general linear regression model (6.7): 


Y; = Bot BiXin + B2Xj2 c t Bp-1Xip-1 t+ & (13.1) 


Linear regression models, as we have seen, include not only first-order models in р—1 
predictor variables but also more complex models. For instance, a polynomial regression 
model in one or more predictor variables is linear in the parameters, such as the following 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 511 


model in two predictor variables with linear, quadratic, and interaction terms; 
Y, = fo + Ха + ВХА + ВХ + PaXin + PsXi Хо + £; (13.2) 


Also, models with transformed variables that are linear in the parameters belong to the class 
of linear regression models, such as the following model: 


logo Y; = Bo + Biv Хи + В exp(Xi2) + £i (13.3) 
In general, we can state a linear regression model in the form: 
Y; = f (X. В) + & (13.4) 
where X; is the vector of the observations on the predictor variables for the ith case: 
1 
Хи 
X = E (13.4a) 
X ip-l 


В is the vector ef the regression coefficients in (6.18c), and f (X;, В) represents the expected 
value E(Y;), which for linear regression models equals according to (6.54): 


SX: p = xip (13.4b) 


Nonlinear Regression Models 


Nonlinear regression models are of the same basic form as that in (13.4) for linear regression 
models: | 


Y; = fiy) + & (13.5) 


An observation Y; is still the sum of a mean response f (Xi, y) given by the nonlinear 
response function f (X, y) and the error term ¢;. The error terms usually are assumed to 
have expectation zero, constant variance, and to be uncorrelated, just as for linear regression 
models. Often, a normal error model is utilized which assumes that the error terms are 
independent normal random variables with constant variance. 

The parameter vector in the response function f (X, y) is now denoted by y rather than 
В as a reminder that the response function here is nonlinear in the parameters. We present 
now two examples of nonlinear regression models that are widely used in practice. 


Exponential Regression Models. One widely used nonlinear regression model is the 
exponential regression model. When there is only a single predictor variable, one form of 
this regression model with normal error terms is: _ 

Y; = yoexp( Xi) + & ^ (13.6) 
where: 2 > 


yo апа y, are parameters 
X; are known constants 
є; are independent N (0, o?) 


512 Part Three Nonlinear Regression 


FIGURE 13.1 
Plots of 
Exponential 
and Logistic 
Response 
Functions. 


The response function for this model is: 


Р(Х, y) = yoexpQA X) (1 37) 


Note that this model is not linear in the parameters у апа у. 
А more general nonlinear exponential regression model in one predictor Variable y; 
normal error terms is: ith. 


Y; = yo + и €xp(¥2Xi) + ғ (138) 


where the error terms are independent normal with constant variance o?. The response 
function for this regression model is: 


fX, у) = у + yı ехр(рХ) (13.9) 


Exponential regression model (13.8) is commonly used in growth studies where the rate 
of growth at a given time X is proportional to the amount of growth remaining as time 
increases, with yo representing the maximum growth value. Another use of this regression 
model is to relate the concentration of a substance (Y) to elapsed time (X). Figure 13,1 
shows the response function (13.9) for parameter values уо = 100, y, = —50, апа y = ~7, 
We shall discuss exponential regression models (13.6) and (13.8) in more detail later in this 
chapter. 


Logistic Regression Models. Another important nonlinear regression model is the logis- 
tic regression model. This model with one predictor variable and normal error terms is: 


Yo 


Ү = H+ i 
1+ yı expOoX;) 


(13.10) 


where the error terms £; are independent normal with constant variance o?. The response 


(a) (b) 
Exponential Model (13.8): Logistic Model (13.10): 
ҢҮ} = 100 — 50 exp(-2X) ҢҮ} = 10/[1 + 20 exp(—2x)] 
ҢҮ} ҢҮ} 
100 
10 
90 
8 
80 
6 
70 
4 
60 
2 
50 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 513 


function here is: 


Yo 


i) === 
fev 1+ у ехр(уХ) 


(13.11) 


Note again that this response function is not linear in the parameters yo, у, and у. 

This logistic regression model has been used in population studies to relate, for instance, 
number of species (Y) to time (X). Figure 13.16 shows the logistic response function (13.11) 
for parameter values yg = 10, y, = 20, and ур = —2. Note that the parameter ур = 10 
represents the maximum growth value here. 

Logistic regression model (13.10) is also widely used when the response variable is 
qualitative. An example of this use of the logistic regression model is predicting whether 
a household will purchase a new car this year (will, will not) on the basis of the predictor 
variables age of presently owned car, household income, and size of household. In this 
use of logistic regression models, the response variable (will, will not purchase car, in our 
example) is qualitative and will be represented by a 0, 1 indicator variable. Consequently, 
the error terms are not normally distributed here with constant variance. Logistic regression 
models and their use when the response variable is qualitative will be discussed in detail in 
Chapter 14. — 


General Form of Nonlinear Regression Models. As we have seen from the two examples 
of nonlinear regression models, these models are similar in general form to linear regression 
models. Bach Y; observation is postulated to be the sum of a mean response f (Xi, y) based 
on the given nonlinear response function and a random error term ¢;. Furthermore, the 
error terms =; are often assumed to be independent normal random variables with constant 
variance. 

An important difference of nonlinear regression models is that the number of regression 
parameters is not necessarily directly related to the number of X variables in the model. 
In linear regression models, if there are p — 1 X variables in the model, then there are 
p regression coefficients in the model. For the exponential regression model in (13.8), there 
is one X variable but three regression coefficients. The same is found for logistic regression 
model (13.10). Hence, we now denote the number of X variables in the nonlinear regression 
model by q, but we continue to denote the number of regression parameters in the response 
function by p. In the exponential regression model (13.6), for instance, there are p —2 
regression parameters and 4 = 1 X variable. 

Also, we shall define the vector X; of the observations on the X variables without the 
initial element 1. The general form of a nonlinear regression model is therefore expressed 
as follows: 


Y; = Р(Х, y) + & (13.12) 
where: i 
Xi yo i 
Х = js узе _ р (13.122) 


514 Part Three Nonlinear Regression 


Comment 


Nonlinear response functions that can be linearized by a transformation are sometimes calleg intri 
sically linear response functions. For example, the exponential response function: 


ГОХ. y) = yolexp(y ХІ 
is an intrinsically linear response function because it can be linearized by the logarithmic 
transformation: 
log, /(Х. у) = log. yo + n X 


This transfornied response function can be represented in the ћпеаг model form: 
8(X. y) = Bo + ДХ 


where g(X, у) = log, f (X. y). Bo = log. yo. and fi = у. 

Just because a nonlinear response function is intrinsically linear does not necessarily imply that 
linear regression is appropriate. The reason is that the transformation to linearize the response function 
will affect the error term in the model. For example, suppose that the following exponential regression 
model with normal error terms that have constant variance is appropriate: 


Y; = упехр(у Хг) + е; 
A logarithmic transformation of Y to linearize the response function will affect the normal error term 
€; so that the error term in the linearized model will no longer be normal with constant variance, Hence, 


it is important to study any nonlinear regression model that has been linearized for appropriateness; 
it may turn out that the nonlinear regression model is preferable to the linearized version. [| 


Estimation of Regression Parameters 


Example 


Estimation of the parameters of a nonlinear regression model is usually carried out by the 
method of least squares or the method of maximum likelihood, just as for linear regres- 
sion models. Also as in linear regression, both of these methods of estimation yield the 
same parameter estimates when the error terms in nonlinear regression model (13.12) are 
independent normal with constant variance. 

Unlike linear regression, it is usually not possible to find analytical expressions for 
the least squares and maximum likelihood estimators for nonlinear regression models. 
Instead, numerical search procedures must be used with both of these estimation procedures, 
requiring intensive computations. The analysis of nonlinear regression models is therefore 
usually carried out by utilizing standard computer software programs. 


To illustrate the fitting and analysis of nonlinear regression models in а sitnple fashion, 
we shall use an example where the model has only two parameters and the sample size 
is reasonably small. In so doing, we shall be able to explain the concepts and procedures 
without overwhelming the reader with details. 

A hospital administrator wished to develop a regression model for predicting the de- 
gree of long-term recovery after discharge from the hospital for severely injured patients. 
The predictor variable to be utilized is number of days of hospitalization (X), and the 
response variable is a prognostic index for long-term recovery (Y), with large values of 
the index reflecting a good prognosis. Data for 15 patients were studied and are presented 
in Table 13.1. A scatter plot of the data is shown in Figure 13.2. Related carlier studies 
reported in the literature found the relationship between the predictor variable and the re 
sponse variable to be exponential. Hence. it was decided to investigate the appropriateness 
of the two-parameter nonlinear exponential regression model (13.6): 


Y; = yoexp( X;) + €, (13.13) 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 515 


ABLE 13.1 O Days. Prognóstic 
pata—SeverelY Patient -Hospitalized Index 
Example. 2 5 50 
з 7 45. 
4 10 37 
5 14 35 
6 19 25 
7 26 - 20 
9. 34 * 18. 
10 38 13 
11: 45 8 
12 52 11 
43 53 8 3 
14. X60: 4. 
2035 | 65 6 
AMT DT ONE 5м „сыш. ш ты 


FIGURE 13.2 60 
Scatter Plot 

and Fitted 50 
Nonlinear 
Regression 
Function— 
Severely 
Injured 
Patients 
Example. 


A 
© 


Prognostic index 
UJ 
© 


10 20 30 40 50 60 70 
Days Hospitalized 


where the £; are independent normal with constant variance. If this model is appropriate, it 
is desired to estimate the regression parameters yo and 7. 


Least Squares Estimation in Nonlinear Regression | 


We noted in Chapter 1 that the method of least squares for simple linear regression requires 
the minimization of the criterion О in (1.8): 


Q — [Y - (bo AXOP (13.14) 
i=1 


516 Part Three Nonlinear Regression 


Example 


Those values of Во and f, that minimize Q for the given sample observations (X; 
the least squares estimates and are denoted by bo and bj. 

We also noted in Chapter 1 that one method for finding the least squares estim 
by use of a numerical search procedure. With this approach, Q in (13.14) is evalu 
different values of By and f, varying By and f systematically until the minimum уа] 
is found. The values of Ду and f, that minimize О are the least squares estimates bo and b 

A second method for finding the least squares estimates is by means of the least T 
normal equations. Here, the least squares normal equations are found analytically by dif. 
entiating О with respect to By and f, and setting the derivatives equal to zero, The Solution - 
of the normal equations yields the least squares estimates. ‘ 

As we saw in Chapter 6, these procedures extend directly to multiple linear regression, for 
which the least squares criterion is given in (6.22). The concepts of least squares estimation ‘ 
for linear regression also extend directly to nonlinear regression models. The least squares 
criterion again is: 


Y) am. 
ales 15 : 
ated fop- 
ue of 9 


Q - M I - /(Х у)? (13.15) 


i=l 


where f(X;. y) is the mean response for the ith case according to the nonlinear response 
function f (X, y). The least squares criterion Q in (13.15) must be minimized with respect 
to the nonlinear regression parameters yo, yj, ---. Ур-1 to obtain the least squares estimates, 
The same two methods for finding the least squares estimates—numerical search and normal 
equations—may be used in nonlinear regression. A difference from linear regression is that 
the solution of the normal equations usually requires an iterative numerical search procedure 
because analytical solutions generally cannot be found. 


The response function in the severely injured patients example is seen from (13.13) to be: 


f (X. y) = yoexp(n X) 


Hence, the least squares criterion Q here ts: 


Q= Улу; — yoexp(a XÐ) 


i=l 


We can see that the method of maximum likelihood leads to the same criterion here 
when the error terms e; are independent normal with constant variance by considering the 
likelihood function: 


at 


| 
293 py LY; — yoexp( XDI 


i=! 


Гу. о?) = 


1 
(2x0? y? e| 


Just as for linear regression, maximizing this likelihood function with respect to the regres 
sion parameters yo and y, is equivalent to minimizing the sum in the exponent, so that the 
maximum likelihood estimates are the same here as the least squares estimates. 

We now discuss how to obtain the least squares estimates, first by use of the normal 
equations and then by direct numerical search procedures. 


Solu 


Example 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 517 


ution of Normal Equations 


To obtain the normal equations for a nonlinear regression model: 
Y; = f (Xi, y) + £i 


we need to minimize the least squares criterion Q: 


- 


Q-Y i - £f. yr 


i=l 


with respect to yo, у, ---, Ур—1- The partial derivative of О with respect to y; is: 
9Q «X Ed 
= 2[Y; Xi y|——— 13.16 
Ds FO. |= (13.16) 


When the p partial derivatives are each set equal to 0 and the parameters y; are replaced by 
the least squares estimates gy, we obtain after some simplification the p normal equations: 


seg Ў.а [E =0 k=0,1,...,p—1 
дү, y= i=l 


y-g 
(13.17) 
where g is the vector of the least squares estimates г: 

80 

81 
в = | . (13.18) 

рхі : 
Sp-1 


Note that the terms in brackets in (13.17) are the partial derivatives in (13.16) with the 
parameters у, replaced by the least squares estimates gy. 

The normal equations (13.17) for nonlinear regression models are nonlinear in the pa- 
rameter estimates gą and are usually difficult to solve, even in the simplest of cases. Hence, 
numerical search procedures are ordinarily required to obtain a solution of the normal equa- 
tions iteratively. To make things still more difficult, multiple solutions may be possible. 


In the severely injured patients example, the mean response for the ith case is: 


fO. y) = уоехр(у Хд (13.19) 
Hence, the partial derivatives of f (X;, y) are: 
FOU) c LED | (13.20а) 
ду 
сае VX; exp( Xi) (13.20b) 


ду, 


518 Part Three Nonlinear Regression 


Replacing yg and yı in (13.19), (13.202), and (13.206) by the respective least squares 
estimates go and g,, the normal equations (13.17) therefore are: 


5 Y; exp(g1X;) — Ус в exp(gi X;) ехр(е, Х;) = 0 
Э Y;goX; exp(giX;) — Уш exp(gi1X;)goX; exp(giX;) = 0 
Upon simplification, the normal equations become: 
XO Yi exp(eiX;) — go expQanX) =0 
Y Y; Xi exp(gi Xi) — go У X, exp(2grXi) = 0 


These normal equations are not linear in go and gj, and no closed-form solution exists 
Thus, numerical methods will be required to find the solution for the least squares estimates 
iteratively. 


Direct Numerical Search—Gauss-Newton Method 
In many nonlinear regression problems, it is more practical to find the least squares estimates 
by direct numerical search procedures rather than by first obtaining the normal equations 
and then using numerical methods to find the solution for these equations iteratively. The 
major statistical computer packages employ one or more direct numerical search procedures 
for solving nonlinear regression problems. We now explain one of these direct numerical 
search methods. 

The Gauss-Newton method, also called the linearization method, uses a Taylor series 
expansion to approximate the nonlinear regression model with linear terms and then employs 
ordinary least squares to estimate the parameters. Iteration of these steps generally leads to 
a solution to the nonlinear regression problem. 

The Gauss-Newton method begins with initial or starting шше, for the regression. 
parameters yo, yi, . - › Y»-1. We denote these by g”, gi... | " where the superscript 
in parentheses denotes the iteration number. The starting values may be obtained from 
previous or related studies, theoretical expectations, or a elias search for parameter, 
values that lead to a comparatively low criterion value О in (13.15). We shall later discuss. 
in more detail the choice of the starting values. | 

Once the starting values for the parameters have been obtained, we approximate the 
mean responses f(X;, y) 106 ше n cases by the linear terms in the Taylor series expansion. 
around the starting values а”. We obtain for the ith case: 


р—1 
/(Х,, у) = f(X;.g) + У С» »| (vi E 2%) (13.21) 


y-g? 


where: 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 519 


Note that g is the vector of the parameter starting values. The terms in brackets in (13.21) 
are the same partial derivatives of the regression function we encountered earlier in the 


normal equations (13.17), but here they are evaluated at y, = ge fork =0,1,...,p—1. 
Let us now simplify the notation as follows: 
f? = Р(Х, 8%) (13.22a) 
BO = y — в (13.22b) 
9f (Xi, 
D? = [x=] (13.22) 
ду y=g® 


The Taylor approximation (13.21) for the mean response for the ith case then becomes in 
this notation: 


p—i 
„ РО, у) ~ 39 +Y DRE 
k=0 


and an approximation to the nonlinear regression model (13.12): 


Y; = }(Х;,у) + £i 
is: 
p-1 
Y; & fj +Y DP BO +6; (13.23) 
k=0 


When we shift the f, term to the left and denote the difference Y; — f° by У, we obtain 
the following linear regression model approximation: 


p-1 
КЮ м S DOS +в Hl. (13.24) 
к=0 


where: 
yf = y, - f? (13.24a) 
Note that the linear regression model approximation (13.24) is of the form: 
Y; = PoXio + В.Ха t Вр Xp + Ei 


The responses y" in (13.24) are residuals, namely, the deviations of the observations 
around the nonlinear regression function with the parameters replaced by the starting esti- 
mates. The X variables observations D® are the partial derivatives ef the mean response 
evaluated for each of the n cases with the parameters replaced by the, starting estimates. 
Each regression coefficient po represents the difference between the true regression pa- 
rameter and the initial estimate of the parameter. Thus, the regression coefficients represent 
the adjustment amounts by which the initial regression coefficients must be corrected. The 
purpose of fitting the linear regression model approximation (13.24) is therefore to estimate 
the regression coefficients Bo and use these estimates to adjust the initial starting estimates 
of the regression parameters. In fitting this linear regression approximation, note that there 


520 Part Three Nonlinear Regression 


is no intercept term in the model. Use of a computer multiple regression package therefor 

+ E - a . e 
requires à specification of no intercept. 

We shall represent the linear regression model approximation (13.24) in matrix form ag 


follows: 
ks 0 
ere Dr te (13.25) 
where: 
пед Dio c Di 
.p-1l 
(13.25a) yw"— E (13.25b) DO = : 
axl Ы "nm . < 
Y, m 9) ' Do NM D 
(0) 
0 &\ 
(13.25€) pM =|: (13.254) e€ = 
pxi (0) axl А 
Bp-i n 


Note again that the approximation model (13.25) is precisely in the form of the general 
linear regression model (6.19), with the D matrix of partial derivatives now playing the тое 
of the X matrix (but without a column of 1s for the intercept). We can therefore estimate 
the parameters B® by ordinary least squares and obtain according to (6.25): 


po = (DOD) "Poy (13.26) 


where b® is the vector of the least squares estimated regression coefficients. As we noted 
earlier, an ordinary multiple regression computer program can be used to obtain the estimated 
regression coefficients bt”, with a specification of no intercept. 

We then use these least squares estimates to obtain revised estimated regression coeffi- 
cients gf" by means of (13.22b): 

(0 (0) 
&& = & 

where gk! denotes the revised estimate of y, at the end of the first iteration. In matrix form, 
we represent the revision process as follows: 


в) = g 4 po (13.27): 


At this point, we can examine whether the revised regression coefficients represent: 
adjustments in the proper direction. We shall denote the least squares criterion measure Q 
in (13.15) evaluated for the starting regression coefficients g by SSE; it is: 


(0) 


+b; 


п п 


SSE” = S [и – f(X.g) 25 - л") (1329 


i=l i=l 
At the end of the first iteration, the revised estimated regression coefficients are g”, and 
the least squares criterion measure evaluated at this stage. now denoted by SSE”, is: 


Hn п 


SSE” = y [V 0) = (и) qo 


1==1 i—1 


Example 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 521 


If the Gauss-Newton method is working effectively in the first iteration, SSE should be 
smaller than SSE since the revised estimated regression coefficients g” should be better 
estimates. 

Note that the nonlinear regression functions f (X;, g) and f(X;, g) are used in 
calculating SSE and SSE, and not the linear approximations from the Taylor series 
expansion. 

The revised regression coefficients g(? are not, of course, the least squares estimates for 
the nonlinear regression problem because the fitted model (13.25) is only an approximation 
of the nonlinear model. The Gauss-Newton method therefore repeats the procedure just 
described, with g now used for the new starting values. This produces a new set of 
revised estimates, denoted by #0), and a new least squares criterion measure SSE. The 
iterative process is continued until the differences between successive coefficient estimates 

6+) — g®) and/or the difference between successive least squares criterion measures 
SSEG*" — SSE® become negligible. We shall denote the final estimates of the regression 
coefficients simply by g and the final least squares criterion measure, which is the error sum 
of squares, by SSE. 

The Gauss-Newton method works effectively in many nonlinear regression applications. 
In some instances, however, the method may require numerous iterations before converging, 
and in a few cases it may not converge at all. 


In the severely injured patients example, the initial values of the parameters yo and yı 
were obtained by noting that a logarithmic transformation of the response function lin- 
earizes it: 

log, уо[ехр(у X)] = log, yo + УХ 


Hence, a linear regression model with a transformed Y variable was fitted as an initial 
approximation to the exponential model: 


Y; = Po + В.Х; + & 


where: 
Y; = log, Y; 
Во = log, yo 
В = у 


This linear regression model was fitted by ordinary least squares and yielded the estimated 
regression coefficients Ро = 4.0371 and bı = —.03797 (calculations not shown). Hence, 
the initial starting values are g = exp(bo) = ехр(4.0371) = 56.6646 and g? =һ= 
— .03797. Я 

The least squares criterion measure at this stage requires evaluation of the nonlinear 
regression function (13.7) for each case, utilizing the starting parameter values 20) and 
gti. For instance, for the first case, for which X, — 2, we obtain: 


F(X, g0) = f = gP exp(gf? x.) = (56.6646) exp[—.03797(2)] = 52.5208 


522 Part Three Nonlinear Regression 


TABLE 13.2 
yo and р® 
Matrices— 
Severely 
Injured 
Patients 
Example. 


n-Og Y: — 90 exp(g® x1) 1.4792 
3.1337 | 
1.5609 
—1.7624 
1.6996 |. 
—2.5422 
; à —1.1139 | 
y — . = - = | —1.4629 
ii ; . 2.4172 
— .3871 
—2.2625 
3.1327 
.4259 
—1.8063 
Yis — fi Yis — gh exp(di X15) 1 1.1977 


exp(g X) dj? Xv exp” X) 92687 105.041‹ 
.82708 234.3317 

.76660 304.0736 

.68407 387.6236. 

58768 466.2057 

48606 523.3020 

37261 548.9603 

po = . = | 30818 541.3505 
DES | 27500 5298162 
23625 508.7088 

18111 461.8140 

13884 409.0975 | 

.13367 401:4294| 

10247 348.3801 

exp(g X45) 90 X15 exp(g X15) 08475 312.1510 


Since Y; = 54, the deviation from the mean response is: 
yO = y, — f = 54 — 52.5208 = 1.4792 


Note again that the deviation Y? is the residual for case 1 at the initial fitting stage 
since f is the estimated mean response when the initial estimates g of the parameters 
are employed. The stage 0 residuals for this and the other sample cases are presented in 
Table 13.2 and constitute the Y vector. 

The least squares criterion measure at this initial stage then is simply the sum of the 
squared stage 0 residuals: 


SSE = 5 (Y, LS fey = y (vey 


= (1.4792)? + --- + (1.1977)? = 56.0869 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 523 


To revise the initial estimates for the parameters, we require the D© matrix and the 
YO vector. The latter was already obtained in the process of calculating the least squares 
criterion measure at stage 0. To obtain ће D© matrix, we need the partial derivatives of ће 
regression function (13.19) evaluated агу = g® . The partial derivatives are given in (13.20). 
Table 13.2 shows the D matrix entries in symbolic form and also the numerical values. 
To illustrate the calculations for case 1, we know from Table 13.1 that X, —2. Hence, 
evaluating the partial derivatives at 20, we find: 


af (Х|, 
D® = Уе.» = exp(g\ X,) = exp[—.03797(2)] = .92687 
ду yee 
of (Ki, y) 
Di = [27.0 | = gy” X, exp (e Х‹) 
n y=g® 
= 56.6646(2) exp[—.03797(2)] = 105.0416 | 


We are now ready to obtain the least squares estimates b® by regressing the response 
variable Y 9 in Table 13.2 on the two X variables in D© in Table 13.2, using regression 
with no intercept. A sfandard multiple regression computer program yielded p — 1.8932 


and b® = — 001563. Hence, the vector ЬО of the estimated regression coefficients is: 
ше E 
Ву (13.27), we now obtain the revised least squares estimates gU: 
g” =g9 +b = por i m = 1 
Hence, g? = 58.5578 and zi? = —.03953 are the revised parameter estimates at the 


end of the first iteration. Note that the estimated regression coefficients have been revised 
moderately from the initial values, as can be seen from Table 13.3a, which presents the 
estimated regression coefficients and the least squares criterion measures for the starting 
values and the first iteration. Note also that the least squares criterion measure has been 
reduced in the first iteration. 

Iteration 2 requires that we now revise the residuals from the exponential regression func- 
tion and the first partial derivatives, based on the revised parameter estimates gi? = 58.5578 
and 20 = —.03953. For case 1, for which Y, = 54 and X, = 2, we obtain: 


y(? = y, — ff? = 54 — (58.5578) exp[—.03953(2)] = —.1065 
DY = exp(g(? X,) = exp[—.03953(2)] = .92398 | 
Dj) = e$ X, exp(gi X.) = 58.5578(2) exp[—.03953(2)] = 108.2130 


By comparing these results with the comparable stage 0 results for case 1 in Table 13.2, 
we see that the absolute magnitude of the residual for case 1 is substantially reduced as a 
result of the stage 1 revised fit and that the two partial derivatives are changed to a moderate 


extent. After the revised residuals у? and tbe partial derivatives DO and DP have been 


524 Part Three 


TABLE 13.3 
Gauss-Newton 
Method 
Iterations 

and Final 
Nonlinear 
Least Squares 
Estimates— 
Severely 
Injured 
Patients 
Example. 


Noulinear Regression t 
(a) Estimates of Parameters and Least Squares Criterion Measure à 
Iteration до Ф SSE | 
0 56.6646 —.03797 56.0869 
1 58.5578 —.03953 49.4638 
2 58.6065 —.03959 49.4593 
3 58.6065 —.03959 49.4593 


(b) Final Least Squares Estimates 


k % sta 49.4593 
0 58.6065 1.472 
1 —.03959 00171 


(с) Estimated Approximate Variance-Covariance Matrix of 
Estimated Regression Coefficients 


5.696E-1 —4.682E-4 
2 = -i — А - 
s^(g) = MSE(D'D) =3.80456| с 4 а: 
_[ 23672 —1.781E-3 
=] _1.781E-3 2.928E-6 


obtained for all cases, the revised residuals are regressed on the revised partial derivatives; 
using à no-intercept regression fit, and the estimated regression parameters are again revised 
according to (13.27). 

This process was carried out for three iterations. Table 13.3a contains the estimated: 
regression coefficients and the least squares criterion measure for each iteration. We see 
that while iteration | led to moderate revisions in the estimated regression coefficients and 
a substantially better fit according to the least squares criterion, iteration 2 resulted only in 
minor revisions of the estimated regression coefficients and little improvement in the fit 
Iteration 3 led to no change in either the estimates of the coefficients or the least squares 
criterion measure. 

Hence, the search procedure was terminated after three iterations. The final regressior 
coefficient estimates therefore are go = 58.6065 and g, = —.03959, and the fitted regression 
function is: 


(13.30) 


The error sum of squares for this fitted model is SSE = 49.4593. Figure 13.2 on page 515 
shows a plot of this estimated regression function, together with a scatter plot of the data. 
The fit appears to be a good one. 


Y = (58.6065) exp(—.03959 X) 


Comments 


і. The choice of initial starting values is very important with the Gauss-Newton method because 
а poor choice may result in slow convergence, convergence to a local minimum. or even divergence 


* Chapter 13 Introduction to Nonlinear Regression and Neural Networks 525 


Good starting values will generally result in faster convergence, and if multiple minima exist, will 
lead to a solution that is the global minimum rather than a local minimum. Fast convergence, even if 
the initial estimates аге far from the least squares solution, generally indicates that the linear approxi- 
mation model (13.25) is a good approximation to the nonlinear regression model. Slow convergence, 
on the other hand, especially from initial estimates reasonably close to the least squares solution, 
usually indicates that the linear approximation model is not a good approximation to the nonlinear 
model. 

2. А variety of methods are available for obtaining starting values for the regression parameters. 
Often, related earlier studies can be utilized to provide good starting values for the regression parame- 
ters. Another possibility is to select p representative observations, set the regression function f (X;, y) 
equal to Y; for each of the p observations (thereby ignoring the random error), solve the p equations 
for the p parameters, and use the solutions as the starting values, provided they lead to reasonably 
good fits of the observed data. Still another possibility is to do a grid search in the parameter space 
by selecting in a grid fashion various trial choices of g, evaluating the least squares criterion Q for 
each of these choices, and using as the starting values that g vector for which Q is smallest. 

3. When using the Gauss-Newton or another direct search procedure, it is often desirable to try 
other sets of starting values after a solution has been obtained to make sure that the same solution will 
be found. 

4. Some computer packages for nonlinear regression require that the user specify the starting 
values for the regression parameters. Others do a grid search to obtain starting values. 

5. Most nonlinear computer programs have a library of commonly used regression functions. 
For nonlinear response functions not in the library and specified by the user, some computer pro- 
grams using the Gauss-Newton method require the user to input also the partial derivatives of the 
regression function, while others numerically approximate partial derivatives from the regression 
function. 

6. The Gauss-Newton method may produce iterations that oscillate widely or result in increases 
in the error sum of squares. Sometimes, these aberrations are only temporary, but occasionally serious 
convergence problems exist. Various modifications of the Gauss-Newton method have been suggested 
to improve its performance, such as the Hartley modification (Ref. 13.1). 

7. Some properties that exist for linear regression least squares do nothold for nonlinearregression 
least squares. For example, the residuals do not necessarily sum to zero for nonlinear least squares. 
Additionally, the error sum of squares SSE and the regression sum of squares SSR do not necessar- 
ily sum to the total sum of squares SSTO. Consequently, the coefficient of multiple determination 
R? — SSR/SSTO is not a meaningful descriptive statistic for nonlincar regression. E 


Other Direct Search Procedures 


Two other direct search procedures, besides the Gauss-Newton method, that are frequently 
used are the method of steepest descent and the Marquardt algorithm. The method of 
steepest descent searches for the minimum least squares criterion measure Q by iteratively 
determining the direction in which the regression coefficieats g should be changed. The 
method of steepest descent is particularly effective when the starting values g® are not 
good, being far from the final values g. 

The Marquardt algorithm seeks to utilize the best features of the Gauss-Newton method 
and the method of steepest descent, and occupies a middle ground between these two 
methods. 

Additional information about direct search procedures can be found in specialized 
sources, such as References 13.2 and 13.3. 


526 Part Three Nonlinear Regressiou 


13.3 Model Building and Diagnostics 


Example 


The model-building process for nonlinear regression models often differs somewhat from 
that for linear regression models. The reason is that the functional form of many nonlinear 
models is less suitable for adding or deleting predictor variables and curvature and interac. 
tion effects in the direct fashion that is feasible for linear regression models. Some types of 
nonlinear regression models do lend themselves to adding and deleting predictor variables 
in a direct fashion. We shall take up two such nonlinear regression models in Chapter |4 
where we consider the logistic and Poisson multiple regression models. С 

Validation of the selected nonlinear regression model сап be performed in the same 
fashion as for linear regression models. 

Use of diagnostic tools to examine the appropriateness of a fitted model plays an impor- 
tant role in the process of building a nonlinear regression model. The appropriateness of 4 
regression model must always be considered, whether the model is linear or nonlinear. Non- 
linear regression models may not be appropriate for the same reasons as linear regression 
models. For example, when nonlinear growth models are used for time series data, there 
is the possibility that the error terms may be correlated. Also, unequal error variances are 
often present when nonlinear growth models with asymptotes are fitted, such as exponential 
models (13.6) and (13.8). Typically, the error variances for cases in the neighborhood of 
the asymptote(s) differ from the error variances for cases elsewhere. 

When replicate observations are available and the sample size is reasonably large, the ap- 
propriateness of a nonlinear regression function can be tested formally by means of the lack 
of fit test for linear regression models in (6.68). This test will be an approximate one for поп- 
linear regression models, but the actual level of significance will be close to the specified level 
when the sample size is reasonably large. Thus, we calculate the pure error sum of squares 
by (3.16), obtain the lack of fit sum of squares by (3.24), and calculate test statistic (6.68b) 
in the usual fashion when performing a formal lack of fit test for a nonlinear response 
function. 

Plots of residuals against time, against the fitted values, and against each of the predictor 
variables can be helpful in diagnosing departures from the assumed model, just as for 
linear regression models. In interpreting residual plots for nonlinear regression, one needs 
to remember that the residuals for nonlinear regression do not necessarily sum to zero. 

If unequal error variances are found to be present, weighted least squares can be used 
in fitting the nonlinear regression model. Alternatively, transformations of the response 
variable can be investigated that may stabilize the variance of the error terms and also 
permit use of a linear regression model. 


In the severely injured patients example. the residuals were obtained by use of the fitted 
nonlinear regression function (13.30): 


e; = Y; — (58.6065) exp(—.03959X;) 


A plot of the residuals against the fitted values is shown in Figure 13.3a, and a normal 
probability plot of the residuals is shown in Figure 13.3b. These plots do not suggest aly 
serious departures from the model assumptions. The residual plot against the fitted values 
in Figure 13.3a does raise the question whether the error variance may be somewhat larger 
for cases with small fitted values near the asymptote. The Brown-Forsythe test (3.9) was 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 527 


(a) Residual Plot against Ў (b) Normal Probability Plot 


3.0 2 
2.0 e 
e 
© S 10 * 
3 з °° 
£ £ $9 
e 
e 
e 
e 
e 
о 10 20 30 40 50 7 -3.0-20-10 0 10 20 3.0 
Fitted Value Expected Value 3 


conducted. Its P-value is .64, indicating that the residuals are consistent with constancy of 
the error variance. — .^ 

On the basis of these, as well as some other diagnostics, it was concluded that exponential 
regression model (13.13) is appropriate for the data. 


13.4 Inferences about Nonlinear Regression Parameters 


Exact inference procedures about the regression parameters are available for linear regres- 
sion models with normal error terms for any sample size. Unfortunately, this is not the 
case for nonlinear regression models with normal error terms, where the least squares and 
maximum likelihood estimators for any given sample size are not normally distributed, are 
not unbiased, and do not have minimum variance. 

Consequently, inferences about the regression parameters in nonlinear regression are 
usually based on large-sample theory. This theory tells us that the least squares and maximum 
likelihood estimators for nonlinear regression models with normal error terms, when the 
sample size is large, are approximately normally distributed and almost unbiased, and 
have almost minimum variance. This large-sample theory also applies when the error terms 
are not normally distributed. 

Before presenting details about large-sample inferences for nonlinear regression, we 
need to consider first how the error term variance o? is estimated for nonlinear regression 
models. i 


Estimate of Error Term Variance 


Inferences about nonlinear regression parameters require an estimate of the error term 
variance o?. This estimate is of the same form as for linear regréssion, the error sum of 
squares again being the sum of the squared residuals: > 


SSE  $*-Ey УИ ЈО, OP 
n-p n-p n= p 


MSE = (13.31) 


528 Part Three Nonlinear Regression : 


Here g is the vector of the final parameter estimates, so that the residuals are the deviations * 
around the fitted nonlinear regression function using the final estimated regression coef. 

cients g. For nonlinear regression, MSE is not an unbiased estimator of o°, but the bias ЖЕ 
small when the sample size is large. a 


Large-Sample Theory 
When the error terms are independent and normally distributed and the Sample size jg 
reasonably large, the following theorem provides the basis for inferences for nonlinear i 
regression models: 


When the error terms e; are independent N (0, o?) and the sample size п > 
is reasonably large, the sampling distribution of g is approximately (13.32) Í 
normal. The expected value of the mean vector is approximately: 3 


$ 


E{g} = y (13.32a) 


The approximate variance-covariance matrix of the regression 
coefficients is estimated by: 


s'(g) = MSEQD'D) ' (13.325) * 


Here D is the matrix of partial derivatives evaluated at the final least squares estimates g- 
just as D in (13.25b) is the matrix of partial derivatives evaluated at g. Note that the 
estimated approximate variance-covariance matrix $° {g} is of exactly the same form as the ° 
опе for linear regression in (6.48), with D again playing the role of the X matrix. 

Thus, when the sample size is large and the error terms are independent normal with con- 
stant variance, the least squares estimators in g for nonlinear regression are approximately 
normally distributed and almost unbiased. They also have near minimum variance, since 
the variance-covariance matrix in (13.32b) estimates the minimum variances. We should 
add that theorem (13.32) holds even if the error terms are not normally distributed. 

As a result of theorem (13.32), inferences for nonlinear regression parameters are carried 
out in the same fashion as for linear regression when the sample size is reasonably large, © 
Thus, an interval estimate for a regression parameter 1s carried out by (6.50) and a test 
by (6.51). The needed estimated variance is obtained from the matrix s?{g} in (13.325). 
These inference procedures when applied to nonlinear regression are only approximate, to 
be sure, but the approximation often is very good. For some nonlinear regression models, $, 
the sample size can be quite small for the large-sample approximation to be good. For other * 
nonlinear regression models, however, the sample size may need to be quite large. 


When Is Large-Sample Theory Applicable? 
Ideally, we would like a rule that would tell us when the sample size in any given nonlinear; 
regression application is large enough so that the large-sample inferences based on asymp- • 
totic theorem (13.32) are appropriate. Unfortunately, no simple rule exists that tells us when‘ 
it is appropriate to use the large-sample inference methods and when it is not appropriate. 
However, a number of guidelines have been developed that are helpful in assessing de. 
appropriateness of using the large-sample inference procedures in a given application. 


1. Quick convergence of the iterative procedure in finding the estimates of the nonlinear | 
regression parameters is often an indication that the linear approximation in (13.25) © 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 529 


the nonlinear regression model is a good approximation and hence that the asymptotic 
properties of the regression estimates are applicable. Slow convergence suggests caution 
and consideration of other guidelines before large-sample inferences are employed. 

2. Several measures have been developed for providing guidance about the appropriate- 
ness of the use of large-sample inference procedures. Bates and Watts (Ref. 13.4) devel- 
oped curvature measures of nonlinearity. These indicate the extent to which the nonlinear 
regression function fitted to the data can be reasonably approximated by the linear approx- 
imation in (13.25). Box (Ref. 13.5) obtained a formula for estimating the bias of the esti- 
mated regression coefficients. A small bias supports the appropriateness of the large-sample 
inference procedures. Hougaard (Ref. 13.6) developed an estimate of the skewness of the 
sampling distributions of the estimated regression coefficients. An indication of little skew- 
ness supports the approximate normality of the sampling distributions and consequently the 
applicability of the large-sample inference procedures. 

3. Bootstrap sampling described in Chapter 11 provides a direct means of examining 
whether the sampling distributions of the nonlinear regression parameter estimates’ are 
approximately normal, whether the variances of the sampling distributions are near the 
variances for the linear approximation model, and whether the bias in each of the parameter 
estimates is fairly small. Ifso, the sampling behavior of the nonlinear regression estimates is 
said to be close-to-linear and the large-sample inference procedures may appropriately be 
used. Nonlinear regression estimates whose sampling distributions are not close to normal, 
whose variances are much larger than the variances for the linear approximation model, 
and for which there is substantial bias are said to behave in a far-from-linear fashion and 
the large-sample inference procedures are then not appropriate. 

Once many bootstrap samples have been obtained and the nonlinear regression parameter 
estimates calculated for each sample, the bootstrap sampling distribution for each param- 
eter estimate can be examined to see if it is near normal. The variances of the bootstrap 
distributions of the estimated regression coefficients can be obtained next to see if they are 
close to the large-sample variance estimates obtained by (13.32b). Similarly, the bootstrap 
confidence intervals for the regression coefficients can be obtained and compared with the 
large-sample confidence intervals. Good agreement between these intervals again provides 
support for the appropriateness of the large-sample inference procedures. In addition, the 
difference between each final regression parameter estimate and the mean of its bootstrap 
sampling distribution is an estimate of the bias of the regression estimate. Small or negligible 
biases of the nonlinear regression estimates support the appropriateness of the large-sample 
inference procedures. 


Remedial Measures. When the diagnostics suggest that large-sample inference proce- 
dures are not appropriate in a particular instance, remedial measures should be explored. 
One possibility is to reparameterize the nonlinear regression model. For example, studies 
have shown that for the nonlinear model: 


Y; = 5Х;/(у + Xi) + £i 2 


the use of large-sample inference procedures is often not appropriate. However, the follow- 
ing reparameterization: 


Y; = Xi/ (01 X; + €) + & 


530 Part Three Nonlinear Regression 


Example 


where 0, = 1/у and 6 = yi/ yo. yields identical fits and generally involves по Problems 
in using large-sample inference procedures for moderate sample sizes (see Ref. 13.7 for 
details). 

Another remedial measure is to use the bootstrap estimates of precision and Confidence 
intervals instead of the large-sample inferences. However, when the linear approximation 
in (13.25) is not a close approximation to the nonlinear regression model, convergence may 
be very slow and bootstrap estimates of precision and confidence intervals may be difficult tp 
obtain. Still another remedial measure that is sometimes available ts to increase the sample 
size. 


For the severely injured patients example, we know from Table 13.3a on page 524 tha 
the final error sum of squares is SSE = 49.4593. Since р = 2 parameters are present in the 
nonlinear response function (13.19), we obtain: 


SSE _ 49.4593 


= = 3.80456 
n-p 15—2 


MSE = 


Table 13.3b presents this mean square, and Table 13.3c contains the large-sample estimated 
variance-covariance matrix of the estimated regression coefficients. The matrix (D'D)-! is 
based on the final regression coefficient estimates g and is shown without computational 
details. 

We see from Table 13.3c that s?{g9} = 2.1672 and s^(g,) = .000002928. The estimated 
standard deviations of the regression coefficients are given in Table 13.3b. 

To check on the appropriateness ofthe large-sample variances of the estimated regression 
coefficients and on the applicability of large-sample inferences in general, we have generated 
1,000 bootstrap samples of size 15. The fixed X sampling procedure was used since the 
exponential model appears to fit the data well and the error term variance appears to be 
fairly constant. Histograms of the resulting bootstrap sampling distributions of gb and gf 
are shown in Figure 13.4, together with some characteristics of these distributions. We see 
that the go distribution is close to normal. The gf distribution suggests that the sampling 
distribution may be slightly skewed to the left, but the departure from normality does not 
appear to be great. The means of the distribution, denoted by g; and gf, are very close to 
the final least squares estimates, indicating that the bias in the estimates is negligible: 


20 = 58.67 gi = —.03936 
go = 58.61 gı = —.03959 


Furthermore, the standard deviations of the bootstrap sampling distributions are very close 
to the large-sample standard deviations in Table 13.3b: 


s*{g5} = 1.423 s*{ gf} = .00142 
s{go} = 1.472 s{g,} = .00171 


These indications all point to the appropriateness of large-sample inferences here, even 
though the sample size (п = 15) is not very large. 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 531 


‚СОВЕ 13.4 Bootstrap Sampling Distributions—Severely Injured Patients Example. 


(a) Histogram of Bootstrap Estimates 90 (b) Histogram of Bootstrap Estimates ої 
M 14 
12 cru be 12 Es 
10 EH Phr 19 Г 
Е 8 B | b 1 © i Е ue 
© Ed p T. Р m з 
5 6 dl М g 6 E > 
o "| * ЖЕ i 
4 on z 4 Di Hl 
54 55 56 57 58 oe 60 61 62 63 i 3 3 x 3 © $ © К 2 
90 б с с o o o o o o o 
[ І І І ] І І І І І 
й 
Jë = 58.67 9% = —.03936 
549%) = 1.423 d 59%} = 00142 
90025) = 56.044 910.025) = —.04207 
9:975) = 61.436 96—975) = —.03681 


Interval Estimation of a Single y, 
Based on large-sample theorem (13.32), the following approximate result holds when the 


sample size is large and the error terms are normally distributed: 
BE пру k=0,1,...,p—1 (13.33) 
sig. 


where t(n — p) is at variable with n — p degrees of freedom. Hence, approximate 1 — o 
confidence limits for any single у; are formed by means of (6.50): 


gx Et — o/2;n — p)s{gx} (13.34) 
where t(1 — 0/2; n — p) is the (1 — a/2)100 percentile of the ¢ distribution with n — p 
degrees of freedom. 
Example For the severely injured patients example, it is desired to estimate yj with a 95 percent 


confidence interval. We require £(.975; 13) = 2.160, and find"from Table 13.36 that g, = 
—.03959 and s(g1) = .00171. Hence, the confidence limits are —.03959 + 2.160(.00171), 
and the approximate 95 percent confidence interval for y, is: 


—.0433 « y, < —.0359 


Thus, we can conclude with approximate 95 percent confidence that y, is between —.0433 
and —.0359. То confirm the appropriateness of this large-sample confidence interval, we 


532 Part Three Nonlinear Regression 


shall obtain the 95 percent bootstrap confidence interval for yı. Using (11.58) and the resuly 
in Figure 13.4b, we obtain: i 
dı = gı — gï (-025) = —.03959 + .04207 = .00248 
Ф = g1(975) — gı = —.03681 + -03959 = .00278 


The reflection method confidence limits by (11.59) then are: 


gi — da = —.03959 — .00278 = —.04237 
gı +d, = —.03959 + .00248 = —.03711 


Hence, the 95 percent bootstrap confidence interval is —.0424 < yı € —.0371. This con- 


fidence interval is very close to the large-sample confidence interval. again Supporting the 
appropriateness of large-sample inference procedures here. 


Simultaneous Interval Estimation of Several ук 


Approximate joint confidence intervals for several regression parameters in nonlinear re- 
gression can be developed by the Bonferroni procedure. If m parameters are to be estimated 


with approximate family confidence coefficient 1 — o, the joint Bonferroni confidence 
limits are: 
8; + Bs(gi) (13.35) 
where: 
B = t(1 —of2m;n — p) (13.352) 
Example _ In the severely injured patients example, it is desired to obtain simultaneous interval es- 


timates for yọ and у, with an approximate 90 percent family confidence coefficient. With 
the Bonferroni procedure we therefore require separate confidence intervals for the two 
parameters, each with a 95 percent statement confidence coefficient. We have already ob- 
tained a confidence interval for y, with a 95 percent statement confidence coefficient. The 
approximate 95 percent statement confidence limits for yọ, using the results in Table 13.35, 
are 58.6065 + 2.160(1.472) and the confidence interval for yo is: 


55.43 < yo < 61.79 


Hence, the joint confidence intervals with approximate family confidence coefficient of 
90 percent are: 


55.43 < у < 61.79 
—.0433 < yı < —.0359 


Test Concerning a Single y, 
A large-sample test concerning a single y; is set up in the usual fashion. To test 


Ho: Ye = Ую (13.36a) 
Ha: үх F Yew 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 533 


where yro is the specified value of у, we may use the ѓ* test statistic based on (6.49) when 
n is reasonably large: 
g = E Yo (13.36b) 
siex) 


The decision rule for controlling the risk of making a Type I error at approximately o then is: 


If |t*| € £(10 — o/2; n — p), conclude Ho 


13.36c 
If [> £(1 —o/2;n — p), conclude H, ( ) 
——3á — — Inthe severely injured patients example, we wish to test: 
Example 
MUERE UE Но: yo = 54 
H,: Yo x 54 
The test statistic (13.36b) here is: 
58.6065 — 54 E 
*— aul 
1.472 "s 


For a = .01, we require t (.995; 13) = 3.012. Since |t*| = 3.13 > 3.012, we conclude H,, 
that yo zz 54. The approximate two-sided P-value of the test is .008. 


Test Concerning Several ук 
When a large-sample test concerning several y; simultaneously is desired, we use the same 
approach as for the general linear test, first fitting the full model and obtaining SSE(F), 
then fitting the reduced model and obtaining SSE(R), and finally calculating the same test 
statistic (2.70) as for linear regression: 

|. SSE(R) — SSE(F) 

dfr — dfr 
For large n, this test statistic is distributed approximately as F (dfr — dfr, dfr) when Но 
holds. 


F* + MSE(F) (13.37) 


13.5 Learning Curve Example - 


We now present a second example, to provide an additional illustration of the nonlin- 
ear regression concepts developed in this chapter. An electronics products manufacturer 
undertook the production of a new product in two locations (location A: coded X, = 1, 
location B: coded X, = 0). Location B has more modern facilities and hence was expected 
to be more efficient than location A, even after the initial learning period. An industrial en- 
gineer calculated the expected unit production cost for a modern facility after learning has 
occurred, Weekly unit production costs for each location ‘were then expressed as a fraction 
of this expected cost. The reciprocal of this fraction is a measure of relative efficiency, and 
this relative efficiency measure was utilized as the response variable (Y) in the study. 

It is well known that efficiency increases over time when a new product is produced, 
and that the improvements eventually slow down and the process stabilizes. Hence, it was 
decided to employ an exponential model with an upper asymptote for expressing the relation 
between relative efficiency (Y) and time (X2), and to incorporate a constant effect for the 


534 Part Three Nonlinear Regression 


TABLE 13.4 
Data— 
Learning 
Curve 
Example. 


FIGURE 13.5 
Scatter Plot 
and Fitted 
Nonlinear 
Regression 
Functions— 
Learning 
Curve 
Example. 


difference in the two production locations. The model decided on was: 
Y; = yo + И Хи + уз ехр(у Хро) + £: (13.38) 


When у» and уз are negative, yo is the upper asymptote for location B as X» gets large, and 
yo + yı is the upper asymptote for location A. The parameters y2 and уз reflect the Speed 
of learning, which was expected to be the same in the two locations. 

While weekly data on relative production efficiency for each location were available, we 
shall only use observations for selected weeks during the first 90 weeks of production to 
simplify the presentation. A portion of the data on location, week, and relative efficiency jg 
presented in Table 13.4; a plot of the data is shown in Figure 13.5. Note that learning wag 
relatively rapid in both locations, and that the relative efficiency in location B toward the 


Observation Location Week Relative Efficiency 

i Ха Xr Y; 
1 1 1 .483 E 
2 1 2 .539 
3 1 3 .618 * 

13 1 70 960 

14 1 80 967 

15 1 90 975 

16 0 1 517 

17 0 2 598 

18 0 3 .635 

28 0 70 1.028 

29 0 80 1.017 

30 0 90 1.023 

1.2 


Ё = 1.0156 — .5524 exp(—.1348X) 


1.0 


0.8 5 
Y = 0.9683 — .5524 exp(-.1348X) 


O Location B 
€ Location A 


10 30 50 70 90 
Time (week) 


Relative Efficiency 


0.6 


0.4 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 535 


end of the 90-week period even exceeded 1.0; i.e., the actual unit costs at this stage were 
lower than the industrial engineer's expected unit cost. 

Regression model (13.38) is nonlinear in the parameters у and уз. Hence, a direct 
numerical search estimation procedure was to be employed, for which starting values for 
the parameters are needed. These were developed partly from past experience, partly from 
analysis of the data. Previous studies indicated that уз should be in the neighborhood of —.5, 
so 810) = —.5 was used as the starting value. Since the difference in the relative efficiencies 
between locations A and B for a given week tended to average —.0459 during the 90-week 
period, a starting value ae = —.0459 was specified. The largest observed relative efficiency 
for location B was 1.028, so that a starting value e = 1.025 was felt to be reasonable. 
Only a starting value for ух remains to be found. This was chosen by selecting a typical 
relative efficiency observation in the middle of the time period, Yo, = 1.012, and equating 
it to the response function with X54, = 0, X24? = 30, and the starting values for the other 
regression coefficients (thus ignoring the error term): 


1.012 = 1.025 — (.5) exp(3072) ý 
Solving this equation for y2, the starting value 2 = —.122 was obtained. Tests for several 
other representative observations yielded similar starting values, and gf? = —.122 was 


therefore considered to be a reasonable initial value. 

With the four starting values 20 = 1.025, g® = —.0459, go — —.122, and a = —.5,а 
computer package direct numerical search program was utilized to obtain the least squares 
estimates. The least squares regression coefficients stabilized after five iterations. The final 
estimates, together with the large-sample estimated standard deviations of their sampling 
distributions, are presented in Table 13.5, columns 1 and 2. The fitted regression function is: 


f = 1.0156 — .04727X, — (.5524) ехр(—.1348Х›) (13.39) 


The error sum of squares is SSE = .00329, with 30 — 4 = 26 degrees of freedom. Figure 13.5 
presents the fitted regression functions for the two locations, together with a plot of the data. 
The fit seems to be quite good, and residual plots (not shown) did not indicate any noticeable 
departures from the assumed model. 

In order to explore the applicability of large-sample inference procedures here, bootstrap 
fixed X sampling wasemployed. One thousand bootstrap samples of size 30 were generated. 


TABLE 13.5 Nonlinear Least Squares Estimates and Standard Deviations and Bootstrap 
Results—Learning Curve Example. 


EXT 


1348 .004359 —.13495: 


а) (2) (3): (4) 
Nonlinear ЧЕ 
Least Squares 1, , Bootstrap .. 
gk 59} 9i 
1:0156 .003672 1:015605. d 
—.04727 .004109 —104724 
= 5524 008157 - 55283" 


536 Part Three Nonlinear Regression 


FIGURE 13.6 


Relative Frequency 


0.15 


0.10 


0.05 


1.004 


MINITAB Histograms of Bootstrap Sampling Distributions—Learning Curve Example, 


0.15 0.15 0.15 
| 

0.10 0.10 0.10 

0.05 0.05 0.05 

Em 

.015 1.026 —0.061 -0.050 -0.039 -0.578 ~0.556 0.534 —0148 -0.137 —01% 
9% gi Ф B 
(a) (b) (9 (d) 


The estimated bootstrap means and standard deviations for each of the sampling distributions 
are presented in Table 13.5, columns 3 and 4. Note first that each least squares estimate 
gi in column | of Table 13.5 is very close to the mean g; of its respective bootstrap 
sampling distribution in column 3. indicating that the estimates have very little bias. Note 
also that each large-sample standard deviation s{g,} in column 2 of Table 13.5 is fairly 
close to the respective bootstrap standard deviation s* (gr) in column 4, again supporting the 
applicability of large-sample inference procedures here. Finally, we present in Figure {3,6 
MINITAB plots of the histograms of the four bootstrap sampling distributions. They appear 
to be consistent with approximately normal sampling distributions. These results all indicate 
that the sampling behavior of the nonlinear regression estimates is close to linear and 
therefore support the use of large-sample inferences here. 

There was special interest in the parameter yı, which reflects the effect of location. An 
approximate 95 percent confidence interval is to be constructed. We require 1(.975:26) 
= 2.056. The estimated standard deviation from Table 13.5 is s(g;] = .004109. Hence, the 
approximate 95 percent confidence limits for yı are —.04727 + 2.056(.004109), and the 
confidence interval for yı is: 


—.0557 < y, < —.0388 


An approximate 95 percent confidence interval for y; by the bootstrap reflection method 
was also obtained for comparative purposes using (11.59). It is: 


—.0547 < y, < —.0400 


This is very close to that obtained by large-sample inference procedures. Since у is seen to 
be negative, these confidence intervals confirm that location A with its less modern facilities 
tends to be less efficient. 


Comments 

Е. When learning curve models are fitted to data constituting repeated observations on the same 
unit, such as efficiency data for the same production unit at different points in time, thc error terms may 
be correlated. Hence. in these situations it is important to ascertain whether or not a model assuming 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 537 


uncorrelated error terms is reasonable. In the learning curve example, a plot of the residuals against 
time order did not suggest any serious correlations among the error terms. 

2. With learning curve models, it is not uncommon to find that the error variances are unequal. 
Again, therefore, it is important to check whether the assumption of constancy of error variance is 
reasonable. In the learning curve example, plots of the residuals against the fitted values and time did 
not suggest any serious heteroscedasticity problem. E 


13.6 Introduction to Neural Network Modelin 


In recent years there has been an explosion in the amount of available data, made possible 
in part by the widespread availability of low-cost computer memory and automated data 
collection systems. The regression modeling techniques discussed to this point in this book 
typically were developed for use with data sets involving fewer than 1,000 observations and 
fewer than 50 predictors. Yet it is not uncommon now to be faced with data sets involving 
perhaps millions of observations and hundreds or thousands of predictors. Examples include 
point-of-sale data in marketing, credit card scoring data, on-line monitoring of production 
processes, optical character recognition, internet e-mail filtering data, microchip array data, 
and computerized medical record data. This exponential growth in available data has moti- 
vated researchers in thé fields of statistics, artificial intelligence, and data mining to develop 
simple, flexible, powerful procedures for data modeling that can be applied to very large 
data sets. In this section we discuss one such technique, neural network modeling. 


Neural Network Model 

The basic idea behind the neural network approach is to model the response as a nonlinear 
function of various linear combinations of the predictors. Recall that our standard multi- 
ple regression model (6.7) involves just one linear combination of the predictors, namely 
E{Y;} = Bo + £i Xii +-+ -+ Bp-1Xi,p—1- Thus, as we will demonstrate, the neural network 
model is simply a nonlinear statistical model that contains many more parameters than the 
corresponding linear statistical model. One result of this is that the models will typically 
be overparameterized, resulting in parameters that are uninterpretable, which is a major 
shortcoming of neural network modeling. An advantage of the neural network approach 
is that the resulting model will often perform better in predicting future responses than a 
standard regression model. Such models require large data sets, and are evaluated solely on 
their ability to predict responses in hold-out (validation) data sets. 

In this section we describe the simplest, but most widely used, neural network model, 
the single-hidden-layer, feedforward neural network. This network is sometimes referred 
to as a single-layer perceptron. Jn a neural network model the ith response Y; is modeled 
as a nonlinear function gy of m derived predictor values, Hio, Hj, . +. , His: 

Y; = gy (fo Hio + By Hi +: -- + Bim- Hi si) Fe: = gy (HB) + e; (13.40) 


4 


where: 


ct ee MAS (13.402) 


538 Part Three Nonlinear Regression 


We take Hip equal to | and for j= 1..... p = 1, the jth derived predictor value for the ith 
observation, H;;. isa nonlinear function g; of a linear combination of the original Predictors. 
Н E g;(X;o) j 1 О m— 1 (13.47) 
where: 
Q jo Xio 
ад Ха 
o; = E 
při : pxl : (13.41 a) 
Qi p-l Хера 


and where X;; = 1. Note that Xj is the ith row of the X matrix. Equations (13.40) ang 
(13.41) together form the neural network model: 


ш—1 
Y; = gy (H;B) + & = gy |^ + У) ZUM + & (13.42) 
j=l 
The m functions gy, gy, .. -. £11 are called activation functions in the neural networks 


literature. To completely specify the neural network model, it is necessary to identify the m 
activation functions. A common choice for each of these functions is the logistic function: 


802) = = 1 +е7 1! (13.43) 


ler 
This function is flexible and can be adapted to a variety of circumstances. 

As a simple example, consider the case of a single predictor, X,. Then from (13.41), the 
jth derived predictor for the ith observation is: 

g;/(X;otj) = 11 + exp(—a ну — ej Xi] | (13.44) 

(Note that (13.44) is a reparameterization of (13.11), with у= 1, yı — e "^, and y; = 
—o ji.) This function is shown in Figure 13.7 for various choices of ajo and ед. In Fig- 
ure 13.7a, the logistic function is plotted for fixed ajo = 0, and ед = .1, 1, and 10. When 
од = .l, the logistic function is approximately linear over a wide range; when од = 10, 
the function is highly nonlinear in the center of the plot. Generally, relatively larger param- 
eters (in absolute value) are required for highly nonlinear responses, and relatively smaller 
parameters result for approximately linear responses. Changing the sign of œ; reverses the 
orientation of the logistic function, as shown in Figure 13.7b. Finally, for a given value of 
«л. the position of the logistic function along the X\-axis is controlled by ajo. In Figure 
13.7c, the logistic function is plotted for fixed ед = 1 and ajo — — 5, 0. and 5. Note that 
all of the plots in Figure 13.7 reflect a characteristic S- or sigmoidal-shape. and the fact that 
the logistic function has a maximum of | and a minimum of 0. 

Substitution of g in (13.43) for each of gy. gi. .... &ш—1 in (13.42) yields the specific 
neural network model to be discussed in this section: 


Y, = [1 + exp - Hi]! + е 
ml —| 
= [ + exp| - Во = »» 1+ ew-Xepr*|| + &; 
i=l 


= f(X o, .... oi. 8) + 6 (1345) 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 539 


FIGURE 13.7 Various Logistic Activation Functions for Single Predictor. 


where: 


В, о, ..., o4, are unknown parameter vectors 
Xi is a vector of known constants 
є; are residuals 


Neural network model (13.45) is a special case of (13.12) and is therefore a nonlinear 
regression model. In principle, all of the methods discussed in this chapter for estimation, 
testing, and prediction with nonlinear models are applicable. Indeed, any nonlinear regres- 
sion package can be used to estimate the unknown coefficients. Recall, however, that'these 
models are generally overparameterized, and use of standard estimation methods will result 
in fitted models that have poor predictive ability. This is analogous to leaving too many 
unimportant predictors in a linear regression model. Special procedures for fitting model 
(13.45) that lead to better prediction will be considered later in this section. 


540 Part Three 


Nonlinear Regression 


Note that because the logistic activation function is bounded between 0 and | 
necessary to scale Y; so that the scaled value, Y?^ also falls within these limits, This 
accomplished by using: 


> it ig 
Can be 
БДР Ү; = Үл 
Y? = 
Yinax c Yin 


where Ули and Ymax are the minimum and maximum responses. It is also common Practice 
to center and scale each of the predictors to have mean 0 and standard deviation |. These 
transformations are generally handled automatically by neural network software, 


Network Representation 


FIGURE 13.8 
Network Rep- 
resentations of 
Linear 
Regression 
and Neural 
Network 
Models. 


Network diagrams are often used to depict a neural network model. Note that the Standard 
linear regression function: 


E{Y} = Po + AXi +--+ Bp iX pat 


can be represented as a network as shown in Figure 13.8a. The link from each predictor x, 
to the response is labeled with the corresponding regression parameter, f. | 

The feedforward, single-hidden-layer neural network model (13.45) is shown in Fic. 
ше 13.8b. The predictor nodes are labeled Xo, X,..... X „д and are located on the left 
side of the diagram. In the center of the diagram are m hidden nodes. These nodes are 
linked to the p predictor nodes by relation (13.41): thus the links are labeled by using the 
а parameters. Finally, the hidden nodes are linked to the response Y by the 8 parameters, 


Comments 


1. Neural networks were first used as models for the human brain. 'The nodes represented neurons 
and the links between neurons represented synapses. A synapse would "fire" if the signal surpassed 


(a) Linear Regression Model (b) Neural Network Model 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 541 


a threshold. This suggested the use of step functions for the activation function, which were later 
replaced by smooth functions such as the logistic function. 


2. The logistic activation function is sometimes replaced by a radial basis function, which is an 
n-dimensional normal probability density function. Details are provided in Reference 13.8. E 


Neural Network as Generalization of Linear Regression 
It is easy to see that the standard multiple regression model is a special case of neural 


network model (13.45). If we choose for each of the activation functions gy, g,, ..., £m-1 
the identity activation: 
g(Z)= 
we have: 
E{Y¥;} = Bo + Bi Hi ++ ++ + Bm-1 Hina (13.46a) 
and: „ 
Hij = ojo +O Xi +--+ Oj, pa Xip- (13.46b) 


Substitution of (13746b) into (13.462) and rearranging yields: 


т—1 т—1 т—1 
E(Y;) = l^ + M + Р Рр |в Xi, p-ı 
rm 


J=1 j=! 
= Bg + Ха tB, Хр-л (13.47) 
where: 
m-i 
= В+ У Вар 
j=! 
| (13.47а) 


В = У Вов к= 1,...,р-1 


The neural network with identity activation functions thus reduces to the standard linear 
regression model. 

There is a problem, however, with the interpretation of the neural network regression 
coefficients. If the regression function is given by E(Y;] = Bg + Bp Xi +++ B5 Xii as 
indicated in (13.47), then any set of neural network parameters satisfying the p equations in 
(13.472) gives the correct model. Since there are many more neural network parameters than 
there are equations (or equivalently, 6* parameters) there are infinitely many sets of neural 
network parameters that lead to the correct model. Thus, any particular set of neural network 
parameters will have no intrinsic meaning in this case. 

This overparameterization problem is somewhat reduced -with the use of the logistic 
activation function in place of the identity function. Generally, however, if the number of 
hidden nodes is more than just a few, overparameterization will be present, and will lead to 
a fitted model with low predictive ability unless this issue is explicitly considered when the 
parameters are estimated. We now take up such estimation procedures. 


542 Part Three Nonlincar Regression 


Parameter Estimation: Penalized Least Squares 


In Chapter 9 we considered model selection and validation. There, we observed that while R 
never decreases with the addition of a new predictor, our ability to predict holdout responses 
in the validation stage can deteriorate if too many predictors are incorporated. Various Model 
selection criteria. such as Re SBC p, and AIC,, have been adopted that contain penalties for 
the addition of predictors. We commented in Section 11.2 that ridge regression Estimates 
can be obtained by the method of penalized least squares, which directly incorporates д 
penalty for the sum of squares of the regression coefficients. In order to control the teve] of 
overfitting, penalized least squares is frequently used for parameter estimation with neura 
networks, 
The penalized least squares criterion is given by: 


Q = у; IY; — f(X;. B. ou. .... o al + pi(B.oi..... 0,3) (13.48) 


i-i 
where the overfit penalty is: 


т m—| p—l 


pif. oi... m1) =A] 3 ELE Уа (13.483) 


=0 i-i j-0 


Thus, the penalty is a positive constant, A, times the sum of squares of the nonlinear regres- 
sion coefficients. Note that the penalty is imposed not on the number of parameters m 4-mp, 
but on the total magnitude of the parameters. The penalty weight А assigned to the regres- 
sion coefficients governs the trade-off between overfitting and underfitting. If A is large, 
the parameters estimates will be relatively small in absolute magnitude; if А. is small, the 
estimates will be relatively large. A "best" value for À is generally between .001 and .1 and 
is chosen by cross-validation. For example, we may fit the model for a range of A-values 
between .001 and .1, and choose the value that minimizes the total prediction error of the 
hold-out sample. The resulting parameter estimates are called shrinkage estimates because 
use of à > 0 leads to reductions in their absolute magnitudes. 

In Section 13.3 we described various search procedures, such as the Gauss-Newton 
method for finding nonlinear least squares estimates. Such methods can also be used with 
neural networks and penalized least squares criterion (13.48). We observed in Comment 1 on 
page 524, that the choice of starting values is important. Poor choice of starting values may 
lead to convergence to a local minimum (rather than the global minimum) when multiple 
minima exist. The problem of multiple minima is especially prevalent when fitting neural 
networks, due to the typically large numbers of parameters and the functional form of model 
(13.48). For this reason, it is common practice to fit the model many times (typically between 
10 and 50 times) using different sets of randomly chosen starting values for each fit. The set 
of parameter estimates that leads to the lowest value of criterion function (13.48)... the 
best of the best—is chosen for further study. In the neural networks literature, finding а set 
of parameter values that minimize criterion (13.48) is referred to as training the network. 
The number of searches conducted before arriving at the final estimates is referred to as the 
number of tours. 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 543 


Comment 


Neural networks are often trained by a procedure called back-propagation. Back propagation is in 
fact the method of steepest descent, which can be very slow. Recommended methods include the 
conjugate gradient and variable metric methods. Reference 13.8 provides further details concerning 
back-propagation and other search procedures. gl 


Example: Ischemic Heart Disease 


FIGURE 13.9 
JMP Control 
Panel for 
Neural 
Network 

Fit —Ischemic 
Heart Disease 
Example. 


We illustrate the use of neural network model (13.44) and the penalized least squares fitting 
procedure using the Ischemic heart disease data set in Appendix C.9. These data were 
collected by a health insurance plan and provide information concerning 788 subscribers 
who made claims resulting from coronary heart disease. The response (Y) is the natüral 
logarithm of the total cost of services provided and the predictors to be studied here are: 


Predictor Description - 
Хі: Number of interventions, or procedures, carried out 
Xo: Number of tracked drugs used 

Xa, Number of comorbidities—other conditions present 


that complicate the treatment 
Xa: Number of complications—other conditions that 
arose during treatment due to heart disease 


The first 400 observations are used to fit model (13.45) and the lastn* = 388 observations 
were held out for validation. (Note that the observations were originally sorted in a random 
order, so that the hold-out data set is a random sample.) We used JMP to fit and evaluate 
the neural network model. 

Shown in Figure 13.9 is the JMP control panel, which allows the user to specify the var- 
ious characteristics of the model and the fitting procedure. Here, we have chosen 5 hidden 
nodes, and we are using A = .05 as the penalty weight. Also, we have chosen the default val- 
ues for the number of tours (20), the maximum number of iterations for the search procedure 


Hidden Nodes 5, 

Overfit Penalty 0.05| . 

Number of Tours 20i - 

Max lterations 50; А 
Converge Criterion 0.00001, 

[v]Log the tours 
[_]Log the iterations i И 
[Log the estimates 
[заме iterations in table _ 


544 PartThree Nonlinear Regression 


FIGURE 13.10 
JMP Neural 
Network 
Diagram— 
Ischemic Heart 
Disease 
Example. 


FIGURE 13.11 
JMP Results 
for Neural 
Network 
Fit—Ischemic 
Heart Disease 
Example. 


Duratíon 


Intervention 


Comorbid 
Complications 4 


НЗ [— 


Results 
Objective 17 Converged At Best 
SSE 12090315177 2 Converged Worse Than Best 
Penalty 4.4087731663 0 Stuck on Flat 
Total 125.31192493 0 Failed to Improve 
1 Reached Max Iter 
Y SSE SSE Scaled SSE Excluded RMSE RSquare RSquare Exclud 


logCost 441.3037691 120.90315177 407.68215505 0.55465449 0.6962 0.7024 


(50) and the convergence criterion (.00001). By checking the “log the tours” box, we will 
be keeping a record of the results of each of the 20 tours. A JMP network representation of 
model (13.45) is shown in Figure 13.10. Note that this representation excludes the constant 
nodes Xo and Ho. In our notation, there are m = 6 hidden nodes and p = 5 predictor nodes, 
and it is necessary to estimate m + p(m — 1) = 6 + 5(6 — 1) = 31 parameters. 

The results of the best fit, after 20 attempts or tours, is shown in Figure 13.11. The 
penalized least squares criterion value is 125.31. SSE for the scaled response is 120.90. 
JMP indicates that the corresponding SSE for the unscaled (original) responses is 441.30. 
The total prediction error for the validation (excluded) data, is given here by: 


788 
55Еу = У (Y, — Р.) = 407.68 


#=401 


The mean squared prediction error (9.20) is obtained as MSPR = SSEya,/n* = 407.68/ 
388 = 1.05. JMP also gives R° for the training data (.6962). and for the validation data 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 545 


Hi:Intercept 
H2Z'intercept 1.2563122156: 
i HSintercept 2.5829942469| 


H4intercept -1.505357347 
Н5:1лѓегсерї -1.832118976 
| Hi:Duration -0.410405493 


Hiunterventions 2.7694118008 
| HixComorbids 1.3823080642 
H1:Complications 0.4148583852; 
| H2:Duration 0.1040924583 
H2:nterventions 0.983043751 
H2:Comorbids 2.3589628016! 
i H2:Complications -0.2013332821 
H3:Duration 1.5025299752i 
i HSinterventions 1.0761596691/ 
| H3:Comorbids -0.414620124] 
H3:Complications 0.0543940406j 
! H4:Duretion 1.2332218124| E 
H4:nterventions -4.887856867! 
! Hd:Comorbids ^— -1.576610999| 
H4:Complications -1 “068032684! 
H5:Duratlon -0.1597882671 
H5:interventions 12562445449 
H5:Comorbiós 0.1951585624; 
H5:Complications 0.3717883109 
logCost.Intercept 70443318204) 


logCost:H1 -2.165864717, 
logCostH2 1.4877032149, 
1 logCostH3 1.6396831425; 
logCostH4 -2.285420806| 
| egCostHS — 1682288417) 


(.7024). This latter diagnostic was obtained using: 


where SST ya; is the total sum of squares for the validation data. Because these R? values 
are approximately equal, we conclude that the use of weight penalty A = .05 led to a good 
balance between underfitting and overfitting. 

Figure 13.12 shows the 31 parameter estimates produced by JMP and the corresponding 
parameters. We display these values only for completeness-we make no attempt at inter- 
pretation. As noted earlier, our interest is centered on the prediction of future responses. 

For comparison, two least squares regressions of Y on the four predictors X,, Хэ, X3, 
and X, were also carried out. The first was based on a first-order model consisting of the 
four predictors and an intercept term; the second was based on a full second-order model 
consisting of an intercept plus the four linear terms, the four'quadratic terms, and the six 
cross-products among the four predictors. The results for these two multiple regression 
models and the neural network model are summarized in the Table 13.6. i 

From the results, we see that the neural network model’s ability to predict holdout 
responses is superior to the first-order multiple regression and slightly better that the second- 
order multiple regression model. MSPR for the neural network is 1.05, whereas this statistic 
for the first and second-order multiple regression models is 1.28 and 1.09, respectively. 


546 Part Three Nonlinear Regression 


TABLE 13.6 
Comparisons 
of Results for 
Neural 
Network Model 
with Multiple 
Linear 
Regression 
Model— 
Ischemic Heart 
Disease 
Example. 


FIGURE 13.13 
Conditional 
Effects 
Plot—Ischemic 
Heart Disease 
Example. 


Multiple Linear Regression 


Neural Network First-Order Second-Order 
Number of Parameters 31 5 15 
MSE 1.20 1.74 1.34 
MSPR 1.05 1.28 1.09 
10.5 


Predicted logCost 
N 
л 


6.5 
5.5 
4.5 
0 10 20 30 40 50 
Interventions 


Model Interpretation and Prediction 


While individual parameters and derived predictors are usually not interpretable, some 
understanding of the effects of individual predictors can be realized through the use of 
conditional effects plots. For example, Figure 13.13 shows for the ischemic heart data 
example, plots òf predicted response as a function the number of interventions (X2) for 
duration (X,) equal to 0 and 160. The remaining predictors, comorbidities (Хз —3.55) 
and complications (X4 — 0.05), are fixed at their averages for values in the training set. 
The plot indicates that the natural logarithm of cost increases rapidly as the number of 
interventions increases from 0 to 25, and then reaches a plateau and is stable as the number 
of interventions increases from 25 to 50. The duration variable seems to have very little 
effect, except possibly when interventions are between 5 and 10. 

We have noted that neural network models can be very effective tools for prediction when 
large data sets are available. As always, it is important that the uncertainty in any prediction 
be quantified. Methods for producing approximate confidence intervals for estimation and 
prediction have been developed and some packages such as JMP now provide these intervals. 
Details are provided in Reference 13.9. 


5 


Cited 
References 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 547 


ome Final Comments on Neural Network Modeling 


In recent years, neural networks have found widespread application in many fields. Indeed, 
they have become one of the standard tools in the field of data mining, and their use continues 
to grow. This is due largely to the widespread availability of powerful computers that permit 
the fitting of complex models having dozens, hundreds, and even thousands, of parameters. 

A vocabulary has developed that is unique to the field of neural networks. The table below 
(adapted from Ref. 13.10) lists a number of terms that are commonly used by statisticians 


and their neural network equivalents: 


Statistical Term 


coefficient 

predictor 

response 

observation 
parameter estimation 
steepest descent 
intercept 

derived’ predictor 
penalty function 


Neural Network Term 


weight 

input 

output 

exemplar 2 
training or learning 

back-propagation 

bias term 

hidden node 

weight decay 


There are a number of advantages to the neural network modeling approach. These 
include: 


1. Model (13.45) is extremely flexible, and can be used to represent a wide range of response 
surface shapes. For example, with sufficient data, curvatures, interactions, plateaus, and 
step functions can be effectively modeled. 

2. Standard regression assumptions, such as the requirements that the true residuals are 
mutually independent, normally distributed, and have constant variance, are not required 
for neural network modeling. 

3. Outliers in the response and predictors can still have a detrimental effect on the fit of the 
model, but the use of the bounded logistic activation function tends to limit the influence 
of individual cases in comparison with standard regression approaches. 


Of course, there are disadvantages associated with the use of neural networks. Model 
parameters are generally uninterpretable, and the method depends on the availability of 
large data sets. Diagnostics, such as lack of fit tests, identification of influential observations 
and outliers, and significance testing for the effects of the various predictors, are currently 
not generally available. i 


13.1. Hartley, H. O. “The Modified Gauss-Newton Method for the Fitting of Non-linear Regression 
Functions by Least Squares,” Technometrics З (1961), pp. 269-80. ` 

13.2. Gallant, A. R. Nonlinear Statistical Models. New York: John Wiley & Sons, 1987. 

13.3. Kennedy, W. J., Jr., and J. E. Gentle. Statistical Computing. New York: Marcel Dekker, 1980. 
13.4. Bates, D. M., and D. G. Watts. Nonlinear Regression Analysis and Its Applications. New York: 


John Wiley & Sons, 1988. 


548 Part Three Nonlinear Regression 


13.10. 


. Box. M. J. "Bias in Nonlinear Estimation" Journal of the Royal Statistical Socie 


. Hougaard, P. “The Appropriateness of the Asymptotic Distribution in a Nonlinear Re E 


. DeVeaux. R. D.. Schumi, J.. Schweinsberg. J.. and L. H. Ungar. "Prediction Intery, 


ty B E 
рр. 171-201. { 33 (197) / 


® 


sion Model in Relation 10 Curvature” Journal of the Royal Statistical Society В 47 (198 X 
pp. 103-14. 3. 


. Ratkowsky. D. A. Noulinear Regressiou Modeling. New York: Marcel Dekker, 1983 
. Hastie. T.. Tibshirani, R.. and J. Friedman. The Elements of Statistical Learning: Data Mini 
i ng, „ 


Inference. and Prediction. New York: Springer. 2001. 
А ; pen : als for. 
Neural Networks via Nonlincar Regression.” Technometrics 40 (1998). pp. 273-82. i 
DeVeaux. R. D.. and L. H. Ungar. “A Brief Introduction to Neural Мерк, 
www. williams.edu/mathematics/rdevcaux/pubs.htmt (1996). 


orks” 


Problems = «13.1. 


*13.5. 


For each of the following response functions, indicate whether it is a linear response function, ^ 
an intrinsically linear response function, or a nonlinear response function. In the case of an in: 
trinsically linear response function. state how it can be linearized by a suitable transformation: . 


a. f(X. y) = exp(yo + у:Х) 
b. /(Х.у) = % + ут) — yX 
А ү 
с. f(X. y) = yo + —X 
ya 


. Foreach of the following response functions, indicate whether it is a linear response function, . 


an intrinsically linear response function, or a nonlinear response function. In the case of an ip. 
trinsically linear response function, state how it can be linearized by a suitable transformation: 


а. f(X. y) = exp(w + yı log, X) 
b. f(X. y) = vol Xi (Xa)? 
с. f(X.y) = yo - (o 


. a. Plot the logistic response function: 


UR 300 X 
PAY а (30) exp(—1.5X) = 


b. What is the asymptote of this response function? For what value of X does the response 
function reach 90 percent of its asymptote? 


. ч. Plot the exponential response function: 


ГОХ. ү) = 49 — BO)exp(-1.1X) | X20 


b. What is the asymptote of this response function? For what value of X does the response 
function rcach 95 percent of its asymptote? 

Home computers. A computer manufacturer hired a market research firm to investigate the 
relationship between the likelihood a family will purchase a home computer and the price of 
the home computer. The data that follow are based on replicate surveys done in two similar: 
cities. One thousand heads of households in each city were randomly selected and asked if 
they would be likely to purchase a home computer at a given price, Eight prices (X. in dollars) 
were studied. and 100 heads of households in each city were randomly assigned to a giver 
price. The proportion likely to purchase at a given price is denoted by Y. 


*13.6. 


*13.7. 


*13.8. 


*13.9. 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 549 


City A 
i: 1 2 3 4 5 6 7 8 


Хе 200 400 800 1200 1600 2000 3000 4000 
Үг: 65 46  .34 .26 17 15 .06 .04 


City B 
i: 9 10 11 12 13 14 15 16 


X;:: 200 400 800 1200 1600 2000 3000 4000 
Үг: 63 50 .30 .24 19 12 08 05 


No location effect is expected and the data are to be treated as independent replicates at each of. 
the 8 prices. The following exponential model with independent norma] error terms is deemed ; 
to be appropriate: 

Y; = Yo + yaexp( —A Xi) + ё 


a. To obtain initial estimates of yo, ут, and yo, note that f (X, y) approaches a lower asymptote 
yo as X increases without bound. Hence, let go — 0 and observe that when we ignore the 
error term, a logarithmic transformation then yields Y/ = By + £i X;, where Y; = log, Y;, 


Во = юр, д, and f, = — yı. Therefore, fit a linear regression function based on the trans- 
formed data and use as initial estimates g® = 0, g© = —Ру, and gf? = exp(bo). 


b. Using the starting values obtained in part (а), find the least squares estimates of the param- 
eters yo, у, and y». 

Refer to Home computers Problem 13.5. 

a. Plot the estimated nonlinear regression function and the data. Does the fit appear to be 
adequate? 

b. Obtain the residuals and plot them against the fitted values and against X on separate 
graphs. Also obtain a norma] probability plot. Does the model appear to be adequate? 
Refer to Home computers Problem 13.5. Assume that large-sample inferences are appropriate 
here. Conduct a formal approximate test for lack of fit of the nonlinear regression function; 

use œ = .01, State the alternatives, decision rule, and conclusion. 

Refer to Home computers Problem 13.5. Assume that the fitted model is appropriate and 
that large-sample inferences can be employed. Obtain approximate joint confidence intervals 
for the parameters yo, yı, and y2, using the Bonferroni procedure and a 90 percent family 
confidence coefficient. 

Refer to Home computers Problem 13.5. A question has been raised whether the two cities 
are similar enough so that the data can be considered to be replicates. Adding a location 
effect parameter analogous to (13.38) to the model proposed in Problem 13.5 yields the four- 
parameter nonlinear regression model: 


Y; = yo + Хо + ysexp( A Xii) ё 
where: - 


x, [0 ifcity A : 
27]1 сув , A 


a. Using the same starting values as those obtained in Problem 13.5a and go = 0, find the 
least squares estimates of the parameters yo, у, у, and уз. 

b. Assume that large-sample inferences can be employed reasonably here. Obtain an approx- 
imate 95 percent confidence interval for уз. What does this interval indicate about city 


550 PartThree Nonlinear Regression 


13.10. 


13.11. 


13.12. 


*13.13. 


differences? Is this result consistent with your conclusion in Problem 13.72 Does it 


i h 
to be? Discuss. ave 


Enzyme kinetics. In an enzyme kinetics study the velocity of a reaction (Y) is expected 


E - tob, 
related to the concentration (X) as follows: e 


| Xi 


i= cgi 
Yi +X; 


Eighteen concentrations have been studied and the results follow: 


i: 1 2 3 М, 16 17 18 
X; 1 1.5 2 n 30 35 40 
үг: 21 2.5 49 2 19.7 21.3 216 


à. To obtain starting values for yọ and уу, observe that when the error term is ignored we have 
Y? = Bo + В: X}, where Y; = 1/Y;, Bo = 1/)%, £1 = yi/yo, and X; = 1/X;. Therefore fit 
a linear regression function to the transformed data to obtain initial estimates ee = 1 
and g” = р/р. 

b. Using the starting values obtained in part (а), find the least squares estimates of the param. 
eters yy and yr. 


Refer to Enzyme kinetics Problem 13.10. 


a. Plot the estimated nonlinear regression function and the data. Does the fit appear to be 
adequate? 

b. Obtain the residuals and plot them against the fitted values and against X on separate 
graphs. Also obtain a normal probability plot. What do your plots show? 

c. Can you conduct an approximate formal lack of fit test here? Explain. 

d. Given that only 18 trials can be made, what are some advantages and disadvantages of con- 
sidering fewer concentration levels but with some replications, as compared to considering 
18 different concentration levels as was done here? 


Refer to Enzyme kinetics Problem 13.10. Assume that the fitted model is appropriate and 
that large-sample inferences can be employed here. (1) Obtain an approximate 95 percent 
confidence interval for yo. (2) Test whether or not y; = 20; use œ = .05. State the alternatives, 
decision rule, and conclusion. 

Drug responsiveness. A pharmacologist modeled the responsiveness to a drug using the 
following nonlinear regression model: 


Yi =p- —-— + 


E 


X denotes the dose level. in coded form, and Y the responsiveness expressed as a percent of 
the maximum possible responsiveness. In the model, yo is the expected response at saturation, 
y» is the concentration that produces а half-maxima! response, and y; is related to the slope. 
"The data for 19 cases at 13 dose levels follow: 


i: 1 2 3 zv 17 18 19 


Xi: 1 2 3 = 7 8 9 
Y; 5 2.3 3.4 = 94.8 96.2 96.4 


*13.14. 


*13.15. 


*13.16. 


13.17. 


13.18. 


13.19. 


13.20. 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 551 


Orem] least squares estimates of the parameters yo, yı, and y», using starting values go 
100, gO' = 5, and g = 4.8. 


Refer to Drug responsiveness Problem 13.13. 


а. Plot the estimated nonlinear regression function and the data. Does the fit appear to be 
adequate? 

b. Obtain the residuals and plot them against the fitted values and against X on separate graphs. 
Also obtain a normal probability plot. What do your plots show about the adequacy of the 
regression mode]? 

Refer to Drug responsiveness Problem 13.13. Assume that Jarge-sample inferences are ap- 

propriate here. Conduct a formal approximate test for lack of fit of the nonlinear regression 

function; use œ = .01. State the alternatives, decision rule, and conclusion. 

Refer to Drug responsiveness Problem 13.13. Assume that the fitted model is appropriate 

and that large-sample inferences can be employed here. Obtain approximate joint confidence 

intervals for the parameters yo, ут, and у» using the Bonferroni procedure with a 91 percent 
family confidence coefficient. Interpret your results. 

Process yield. The yield (Y) of a chemical process depends on the temperature (X4) and 

pressure (X2). The following nonlinear regression model is expected to be applicable: 


d Y; = yo( X)! (Xi2)” + & 


Prior to beginning full-scale production, 18 tests were undertaken to study the process yield 
for various temperature and pressure combinations. The results follow. 


i 1 2 3 16 17 18 
Ха: 1 10 100 A 1 10 100 
Xi2: 1 1 1 100 100 100 
Yi: 12 32 103 43 128 398 


a. To obtain starting values for yo, у, and у», note that when we ignore the random error term, 
a logarithmic transformation yields Y; = By + £i X;, + £i X55, where Y; = logig Yi, Bo = 
Јово Yo Br = у, Ха = 1080 Ха, Bo = y», and X7, = logo X;2. Fita се multiple 
regression model to the transformed data, and use as starting values a = апор о bo, 

go = = b,, and go = bs. 

b. Durs the starting values obtained in part (a), find the least squares estimates of the param- 
eters yo, ут, and у». 

Refer to Process yield Problem 13.17. 


a. Plot the estimated nonlinear regression function and the data. Does the fit appear to be 
adequate? 

b. Obtain the residuals and plot them against Ӯ, X,, and X2 on separate graphs. Also obtain 
a norma] probability plot. What do your plots show about the adequacy of the model? 
Refer to Process yield Problem 13.17. Assume that large-sample inferences are appropriate 
here. Conduct a formal approximate test for lack of fit of the nonlinear regression function; 

use o = .05. State the alternatives, decision rule, and conclusion. 

Refer to Process yield Problem 13.17. Assume that the fitted model is appropriate and that 

large-sample inferences are applicable here. 

a. Test the hypotheses Ho: yı = y» against Ha: yı # y» using a = .05. State the alternatives, 
decision rule, and conclusion. 


й 


552 Part Three Nonlinear Regression 


b. Obtain approximate joint confidence intervals for the parameters у and Y», usi 
Bonferroni procedure and a 95 percent family confidence coefficient. M the: 

c. What do you conclude about the parameters уу and уз based on the results in Part : 
and (b)? S O 


Exercises 13.21. (Calculus needed.) Refer to Home computers Problem 13.5. К 
a. Obtain the least squares normal equations and show that they are nonlinear in the estimates” 3 
regression coefficients go, g1, and go. ed 
b. State the likelihood function for the nonlinear regression model, assuming that the err ; 
terms are independent N (0, о?). m 


13.22. (Calculus needed.) Refer to Enzyme kinetics Problem 13.10. & 


a. Obtain the least squares normal equations and show that they are nonlinear in the estim 


3 Le ated à, 
regression coefficients gg and gy. ї 


b. State the likelihood function for the nonlinear regression model, assuming that the err М 
terms are independent N (0, c?). E 
13.23. (Calculus needed.) Refer to Process yield Problem 13.17. 


a. Obtain the least squares normal equations and show that they are nonlinear in the estimated" 
regression coefficients go, g1, and g2. 3 

b. State the likelihood function for the nonlinear regression modet, assuming that the error: 
terms are independent N (0, c?). | 


13.24. Refer to Drug responsiveness Problem 13.13. 
a. Assuming that E(e;] = 0, show that: 


A 
E{Y} = (тух) 


А = ехр[у (log, X — log, у»)] = exp(Bo + В.Х”) 


where: 


and Во = —у log, y», Bi = yı, and X’ = log, X. 
b. Assuming yo is known, show that: 
EtY'] , 
TEY] = exp(By + В.Х) 
where Y’ = Y /yo. 
c. What transformation do these results suggest for obtaining a simple linear regression 
function in the transformed variables? 


d. How can starting values for finding the least squares estimates of the nonlinear regression 
parameters be obtained from the estimates of the linear regression coefficients? 


Proj ecís 13.25. Referto Enzyme kinetics Problem 13.10. Starting values for finding the least squares estimates 
of the nonlinear regression mode! parameters are to be obtained by a grid search. The following 
bounds for the two parameters have been specified: 


5 < у= 65 
5 < у < 65 


13.26. 


13.27. 


13.28. 


13.29. 


EN 


Chapter 13 Introduction to Nonlinear Regression and Neural Networks 553 


Obtain 49 grid points by using all possible combinations of the boundary values and five other 
equally spaced points for each parameter range. Evaluate the Jeast squares criterion (13.15) 
foreach grid point and identify the point providing the best fit. Does this point give reasonable 
starting values here? 
Referto Process yield Problem 13.17. Starting values for finding the least squares estimates of 
the nonlinear regression model parameters are to be obtained by a grid search. The following 
bounds for the parameters have been postulated: 
1< ух 21 
25у < .8 
1<у№р< 7 


Obtain 27 grid points by using all possible combinations of the boundary values and the 
midpoint for each of the parameter ranges. Evaluate the least squares criterion (13.15) for 
each grid point and identify the point providing the best fit. Does this point give reasonable 
starting values here? 

Refer to Home computers Problem 13.5. 3 


a. To check on the appropriateness of large-sample inferences here, generate 1,000 bootstrap 
samples of size 16 using the fixed X sampling procedure. For each bootstrap sample, obtain 
the least squares estimates gj, гү, and g3. 

Plot histograms of the bootstrap sampling distributions of go, вт, and g3. Do these distri- 

butions appear to be approximately normal? 

c. Compute the means and standard deviations of the bootstrap sampling distributions for gp, 
&;, апа 23. Are the bootstrap means and standard deviations close to the final least squares 
estimates? 

d. Obtain a confidence interval for y, using the reflection method in (11.59) and confidence 
coefficient .9667. How does this interval compare with the one obtained in Problem 13.8-- 
by the large-sample inference method? 

e. What are the implications of your findings in parts (b), (c), and (d) about the appropriateness 
of large-sample inferences here? Discuss. 


Refer to Enzyme kinetics Problem 13.10. 


a. To check on the appropriateness of large-sample inferences here, generate 1,000 bootstrap 
samples of size 18 using the fixed X sampling procedure. For each bootstrap sample, obtain 
the least squares estimates g) and gr. 

b. Plot histograms of the bootstrap sampling distributions of g; and вт. Do these distributions 
appear to be approximately norma]? 

c. Compute the means and standard deviations of the bootstrap sampling distributions for g5 
and gj. Are the bootstrap means and standard deviations close to the final least squares 
estimates? 

d. Obtain a confidence interval for yo using the reflection method in (11.59) and confidence 
coefficient .95. How does this interval compare with the one obtained in Problem 13.12 by 
the large-sample inference method? 

e. Whatarethe implications of your findings in parts (b), (c), and (d) aboutthe appropriateness 
of large-sample inferences here? Discuss. 


р 


Refer to Drug responsiveness Problem 13.13. 


a. To check on the appropriateness of large-sample inferences here, generate 1,000 bootstrap 
samples of size 19 using the fixed X sampling procedure. For each bootstrap sample, obtain 
the least squares estimates 20, гү, and g7. 


554 Part Three Nonlinear Regression 


b. Plot histograms of the bootstrap sampling distributions of gj, gï, and g3. Do these disti. 


с. 


е. 


butions appear to be approximately normal? 

Compute the means and standard deviations of the bootstrap sampling distributiong for o* 
gi, and gj. Are the bootstrap means and standard deviations close to the final least Square, 
estimates? 

Obtain a confidence interval for y using the reflection method in (11.59) and confidence 
coefficient .97. How does this interval compare with the one obtained in Problem 13.16 by 
the farge-sample inference method? 

What are the implications of your findings in parts (b), (c), and (d) about the Appropriateness 
of larse-sample inferences here? Discuss. 


13.30. Refer to Process yield Problem 13.17. 


a. 


To check on the appropriateness of large-sample inferences here, generate 1,000 bootstrap 
samples of size 18 using the fixed X sampling procedure. For each bootstrap sample, obtain 
the feast squares estimates 20, gj, and gj. 


. Plot histograms of the bootstrap sampling distributions of gj, gy, and £5. Do these distri- 


butions appear to be approximately normal? + 


. Compute the means and standard deviations of the bootstrap sampling distributions for g 


gj, and g5. Are the bootstrap means and standard deviations close to the final least squares 
estimates? 


. Obtain a confidence interval for y, using the reflection method in (11.59) and confidence 


coefficient .975. How does this interval compare with the one obtained in Problem 13.20b 
by the large-sample inference method? 


. Whatare the implications of your findings in parts (b), (c), and (d) about the appropriateness 


of large-sample inferences here? Discuss. 


Case . 
Studies 


13.31. Refer to the Prostate cancer data set in Appendix C.5 and Case Study 9.30. Select a random 
sample of 65 observations to use as the model-building data set. 


a. 


Develop a neural network model for predicting PSA. Justify your choice of number of 
hidden nodes and penalty function weight and interpret your model. 


Assess your model's ability to predict and discuss its usefulness to the oncologists. 


c. Compare the performance of your neural network mode! with that of the best regression 


mode} obtained in Case Study 9.30. Which model is more easily interpreted and why? 


13.32. Refer to the Real estate sales data set in Appendix C.7 and Case Study 9.31. Select a random 
sample of 300 observations to use as the model-building data set. 


a. 


Develop a neural network model for predicting sales price. Justify your choice of number 
of hidden nodes and penalty function weight and interpret your model. 

Assess your modet’s ability to predict and discuss its usefulness as a too! for predicting 
sales prices. 

Compare the performance of your neural network model with that of the best regression 
mode! obtained in Case Study 9.31. Which model is more easily interpreted and why? 


Chapter 


Logistic Regression, 
Poisson Regression, : 
and Generalized | À 


Linear Models 


In Chapter 13 we considered nonlinear regression models where the error terms are normally 
distributed. In this chapter, we take up nonlinear regression models for two important cases 
where the response outcomes are discrete and the error terms are not normally distributed. 
First, we consider the logistic nonlinear regression model for use when the response variable 
is qualitative with two possible outcomes, such as financial status of firm (sound status, 
headed toward insolvency) or blood pressure status (high blood pressure, not high blood 
pressure). We then extend this model so that it can be applied when the response variable is 
a qualitative variable having more Шап two possible outcomes; for instance, blood pressure 
status might be classified as high, normal, or low. 

Next we take up tbe Poisson regression model for use when the response variable is 
a count where large counts are rare events, such as the number of tornadoes in an upper 
Midwest locality during a year. Finally, we explain that nearly all of the nonlinear regression 
models discussed in Chapter 13 and in this chapter, as well as the normal error linear models 
discussed earlier, belong to a family of regression models called generalized linear models. 

The nonlinear regression models presented in this chapter are appropriate for analyzing 
data arising from either observational studies or from experimental studies. 


14.1 Regression Models with Binary Response Variable 


Ina variety of regression applications, the response variable of interest has only two possible 
qualitative outcomes, and therefore can be represented by a bjnary indicator variable taking 
on values 0 and 1. 


» 


1. In an analysis of whether or not business firms have an industrial relations depart- 
ment, according to size of firm, the response variable was defined to have the two possible 
555 


556 Part Three Nonlinear Regression 


outcomes: firm has industrial relations department, firm does not have industrial relations 
department. These outcomes may be coded | and 0, respectively (or vice versa), 

2. Ina study of labor force participation of married women, as a function of age, number 
of children, and husband’s income, the response variable Y was defined to have the two 
possible outcomes: married woman in labor force, married woman not in labor force, Again, 
these outcomes may be coded | and 0, respectively. 

3. In a study of liability insurance possession, according to age of head of household 
amount of liquid assets, and type of occupation of head of household, the response variable y 
was defined to have the two possible outcomes: household has liability insurance, household 
does not have liability insurance. These outcomes again may be coded | and 0, respectively, 

4. Inalongitudinal study of coronary heart disease as a function of age, gender, smoking 
history, cholesterol level, percent of ideal body weight, and blood pressure, the TeSponse 
variable Y was defined to have the two possible outcomes: person developed heart disease 
during the study, person did not develop heart disease during the study. These outcomes 
again may be coded | and 0, respectively. 


* 


These examples show the wide range of applications in which the response variable is 
binary and hence may be represented by an indicator variable. A binary response variable, 
taking on the values 0 and 1, is said to involve binary responses or dichotomous responses, 
We consider first the meaning of the response function when the outcome variable is binary, 
and then we take up some special problems that arise with this type of response variable, 


Meaning of Response Function when Outcome Variable Is Binary 
Consider the simple linear regression model: 
Y; = Bo + FAX; + ё; Y; = 0, 1 (14.1) 


where the outcome Y; is binary, taking on the value of either 0 or 1. The expected response 
E(Y;] has a special meaning in this case. Since Е{&;} = 0 we have: 


ElYi) = fo + ВХ; (14.2) 


Consider Y, to be a Bernoulli random variable for which we can state the probability 
distribution as follows: 


Y; Probability 


1 Р(Ү,=1)=л 
0  P(Y-0)21-m 


Thus, л; is the probability that Y; = 1, and 1 — л; is the probability that Y; = 0. By the 
definition of expected value of a random variable in (A.12), we obtain: 


E(Y;} = V1) +001 л) = z; = PY; = 1) (143) 
Equating (14.2) and (14.3), we thus find: 
EXY;} = fo ВХ; = л} (144) 


FIGURE 14.1 
{iastration of 
Response 
Function when 
Response 
Variable Is 
Binary— 
Industrial 
Relations 
Department 
Example. 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 557 


Probability That Firm Has 
Industrial Relations Department 


ҢҮ} 
1 


ҢҮ} = Bo + &X 


0 Size of Firm X 


The mean response E{Y;} = o+: X; as given by the response function is therefore simply 


." the probability that Y; = 1 when the level of the predictor variable is X;. This interpretation 


of the mean response applies whether the response function is a simple linear one, as here, 
or a complex multiple regression one. The mean response, when the outcome variable is 
a 0, 1 indicator variable, always represents the probability that Y = 1 for the given levels 
of the predictor variables. Figure 14.1 illustrates a simple linear response function for an 
indicator outcome variable. Here, the indicator variable Y refers to whether or nota firm has 
an industrial relations department, and the predictor variable X is size of firm. The response 
function in Figure 14.1 shows the probability that firms of given size have an industrial 
relations department. 


Special Problems when Response Variable Is Binary 


Special problems arise, unfortunately, when the response variable is an indicator variable. 
We consider three of these now, using a simple linear regression model as an illustration. 


1. Nonnormal Error Terms. For а binary 0, 1 response variable, each error term & = 
Y; — (Во + В. X;) can take on only two values: 
When Y; = 1: £j = 1 — Во — В.Х; 
When Y; = 0: £; = —fo — ВХ; 
Clearly, normal error regression model (2.1), which assumes that the £; are normally dis- 
tributed, is not appropriate. i 
2. Nonconstant Error Variance. Another problem with the error terms £; is that they do 


not have equal variances when the response variable is an indicator variable. То see this, 
we shall obtain o?(Y;) for the simple linear regression model (14,1), utilizing (A.15): 


оү} = ELY; — EYP} —( — m) m + (0 л) 0 — л) 


(14.5a) 
(14.5b) 


ог: 


c?^(Y;] = z; (10 —7;) = (Epa — EW) (14.6) 


558 Part Three Nonlinear Regression 


The variance of £; is the same as that of Y; because £; = Y; — л; and л; is a constant: 
c*(5) =m (1 —m) = (Е{Ү,})(1 — EQ (14.7) 
ог: 
ов} = (Bo + iX) — Bo — AX? (1422) 
Note from (14.72) that o?{¢;} depends on X;. Hence, the error variances will differ at 


different levels of X, and ordinary least squares will no longer be optimal. 


3. Constraints on Response Function. Since the response function represents probabjj. 
ities when the outcome variable is a 0, 1 indicator variable, the mean responses should be 


constrained as follows: omm 
0<E{Y}=n <1 (14.8) 


Many response functions do not automatically possess this constraint. A linear response 
function, for instance, may fall outside the constrainj limits within the range of the predictor 
variable in the scope of the model. 


E 


FIGURE 14.2 Examples of Probit and Logistic Mean Response Functions. 


Probability 


Probability 


1.0 


о 
{л 


1.0 


o 
{л 


(a) Probit, with fj = 0 (b) Probit, with B* = —1 


„------= 1.0 --------- 


` 


\ 
\ 
\ 
\ 
\ 
\ 
у 
\ 


Probability 
© 
л 


0.0 


1.0 


Probability 
© 
л 


0.0 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 559 


The difficulties created by the need for the restriction in (14.8) on the response function 
are the most serious. One could use weighted least squares to handle the problem of unequal 
error variances. In addition, with large sample sizes the method of least squares provides 
estimators that are asymptotically normal under quite general conditions, even if the distri- 
bution of the error terms is far from normal. However, the constraint on the mean responses 
to fall between 0 and 1 frequently will rule out a linear response function. In the industrial 
relations department example, for instance, use of a linear response function subject to the 
constraints on the mean response might require a probability of 0 for the mean response for 
all small firms and a probability of 1 for the mean response for all large firms, as illustrated 
in Figure 14.1. Such a model would often be considered unreasonable. Instead, a model 
where the probabilities 0 and 1 are reached asymptotically, as illustrated by each of the 
S-shaped curves in Figure 14.2, would usually be more appropriate. , 


14.2 Sigmoidal Response Functions for Binary Responses : 


In this section, we introduce three response functions for modeling binary responses. These 
functions are bounded between 0 and 1, have a characteristic sigmoidal- or S-shape, and 
.^ approach 0 and 1 asymptotically. These functions arise naturally when the binary response 
variable results from a zero-one recoding (or dichotomization) of an underlying continuous 
response variable, and they are often appropriate for discrete binary responses as well. 


Probit Mean Response Function 
Consider a health researcher studying the effect of a mother’s use of alcohol (X—an index 
of degree of alcohol use during pregnancy) on the duration of her pregnancy (Y^). Here 
we use the superscript c to emphasize that the response variable, pregnancy duration, is a 
continuous response. This can be represented by a simple linear regression model: 


YF = Bj + Б{Х + ef (14.9) 


and we will assume that £f is normally distributed with mean zero and variance 02. 

If the continuous response variable, pregnancy duration, were available, we might pro- 
ceed with the usual simple linear regression analysis. However, in this instance, researchers 
coded each pregnancy duration as preterm or full term using Ше following rule: 

1 if Yf < 38 weeks (preterm) 
à 0 if Yf > 38 weeks (full term) 


It follows from (14.3) and (14.9) that: 


P(Y; = 1) = 7; = P(¥f < 38) (14.10a) 
= P(B5 + ВХ, + ef < 38) (14.10b) 
=P (ef < 38 — 80 — BfX;) (14.10c) 
= pa gm ix) (14.10d) 

Oe Oz Og 


= P(Z < В} + BX) (14.10e) 


560 Part Three Nonlinear Regression 


where 6; = (38 — £j)/o.. Bf = —i/a., and Z = ео. follows a standard normal 
distribution. If we let P(Z < су = ®(z), we have, from (14.10а-е): 
P(Y, = 1) = Ф( + AXi) (14.11) 


Equations (14.3) and (14.11) together yield the nonlinear regression function known as 
the probit mean respouse function: 


E(Y;] = л; = (pf + Bp Xi) (14.12) 


The inverse function, ® '. of the standard normal cumulative distribution function ф 

is sometimes called the probit transformation. We solve for the linear predictor, gë + Bt x. 
E s Е $ i 

in (14.12) by applying the probit transformation to both sides of the expression, obtainino- 
g 


o gu) = лі = Bo + Xi (14.13) 


The resulting expression, л; = £5 + 67 X;, is called the probit response function, or more 
generally, the linear predictor. 

Plots of the probit mean response function (14.12) for various values of 65 and ff are 
shown in Figures 14.2a and 14.2b. Some characteristics of this response function are: 


1. The probit mean response function is bounded between 0 and 1, and it approaches these 
limits asymptotically. 

2. As f, increases (for бү > 0), the mean function becomes more S-shaped, changing 
more rapidly in the center. Figure 14.2a shows two probit mean response functions, 
where both intercept coefficients are 0, and the slope coefficients are | and 5. Notice that 
the curve has a more pronounced S-shape with бү = 5. 

3. Changing the sign of £r from positive to negative changes the mean response function 
from a monotone increasing function to a monotone decreasing function. The probit 
mean response functions plotted in Figure 14.2a have positive slope coefficients while 
those in Figure 14.2b have negative slope coefficients. 

4. Increasing or decreasing the intercept Во shifts the mean response function horizontally. 
(The direction of the shift depends on the signs of both 65 and £r.) Figure 14.2b shows 
two probit mean response functions, where both slope coefficients are — 1, and the 
intercept coefficients are 0 and 5. Notice that the curve has shifted to the right as fly 
changes from 0 to 5. 

5. Finally, we note the following symmetry property of the probit response function. If the 
response variable is recoded using Y? = 1 — Y;, that is. by changing the 15 to Os and 
the Os to 1s—the signs of all of the coefficients are reversed. This follows easily from 
the symmetry of the standard normal distribution: since ®(Z) = | — Ф(– 2), it follows 
that P(Y? = 1) = P(Y; = 0) = 1 — D (Bë ВХ) = Ф(—8 — Bi Xi). 


Logistic Mean Response Function 
We have seen that the assumption of normally distributed errors for the underlying continu 
ous response variable in (14.9) led to the use of the standard normal cumulative distribution 
function, Ф, to model л;. An alternative error distribution that is very similar to the normal 
distribution is the logistic distribution. Figure 14.3 presents plots of the standard normal 
density function and the logistic density function, each with mean zero and variance one. 
The plots are nearly indistinguishable, although the logistic distribution has slightly heavier 


FIGURE 14.3 
Plots of Normal 
Density 

(dashed line) 
and Logistic 
Density (solid 
line), Each 
Having Mean 0 
and Variance 1. 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 561 


Density, f(x) 


tails. The density of a logistic random variable £z having mean Zero and standard deviation 
с = 2/3 has a simple form: 


exp(er) 
m L———— 14.14a 
JEN [1 + exp(e;)P ( ) 
Its cumulative distribution function is: 
exp(e,) 
ГАБ eut. ou NN 4.14 
Е (1) Г + exp(ez) (14.146) 


Suppose now that ef in (14.9) has a logistic distribution with mean zero and standard 
deviation o,. Then, from (14.104) we have: 


Р(Ү; = 1) = 26 «de m) 


where £f /o; follows a logistic distribution with mean zero and standard deviation one. 
Multiplying both sides of the inequality inside the probability statement on the right by 
л /4/3 does not change the probability; therefore: 


E Р ОРЕ 
P(Y; = 1) = л; = Р (252 = 50% + a x) (14.15a) 
= P(e < Bo + В.Х) (14.15b) 
= F, (Bo + Xi) (14.150) 


m ехр(Во + 61 Xi) 
1 + exp(Bo + В.Х) 


where Во = (л/ ~) Be and Br= (хт //3) B* denote the logistic regression parameters. To 
summarize, the logistic mean response function is: 


(14.15d) 


ехр( + Xi) 
1 + exp(fo + В,Х/) 
Straightforward algebra shows that an equivalent form of (14.16) is given by: 


E(Y;) = л; = [1 + exp(—fo — £X)! (14.17) 


E(Y;] = л = Fibo + ВХ) = (14.16) 


562 Part Three Nonlinear Regression 


Applying the inverse of the cumulative distribution function Fz to the two middle terms in 
(14.16) yields: 


F;'(n) = Bo + X; = (14.18) 


The transformation F; ' (71;) is called the logit transformation of the probability л, and ig 


given by: 
л; 
Е! (л) =1 ——— 
Lu) (72) 


where the ratio л; /(1 — л) in (14.182) is called the odds. The linear predictor in (14.1 8)is 
referred to as the logit response function. 

Figures 14.2c and 14.2d each show two logistic mean response functions, where the 
parameters correspond to those in Figures 14.2a and 14.2b for the probit mean response 
function. Itis clearfrom the plots thatthese logistic mean response functions are qualitatively 
similar to the corresponding probit mean response functions. The five properties of the probit 
mean response function, listed earlier, are also true for the logistic mean response function, 
The observed differences in logistic and probit mean response functions are largely due 
to the differences in the scaling of the parameters mentioned previously. Note that the 
symmetry property for the probit mean response function also holds for the logistic mean 
response function. 


(14.182) 


Complementary Log-Log Response Function 


FIGURE 14.4 
Plots of 
Gumbel 
(dashed line), 
Normal (black 
line), and 
Logistic (gray 
line) Density 
Functions, 
Each Having 
Mean 0 and ' 
Variance 1. 


A third mean response function is sometimes used when the error distribution of & is 
not symmetric. The density function f¢(€) of the extreme value or Gumbel probability 
distribution having mean zero and variance one is shown in Figure 14.4, along with the 
comparable standard normal and logistic densities discussed earlier. Notice that this density 
is skewed to the right and clearly distinct from the standard normal and logistic densities. 
It can be shown that use of the Gumbel error distribution for = in (14.9) leads to the mean 
response function: 


л = 1 — exp(—exp (By + Bp Xi)) (14.19) 


Probability 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 563 


Solving for the linear predictor Bf + B X;, we obtain the complementary log-log response 
model: 


7; = log[-log(1 — л(Х;))] = 82 + BEX; (14.19а) 


Тһе symmetry property discussed on page 560 for the logit and probit models does not hold 
for (14.19). 

For the remainder of this chapter, we focus on the use of the logistic mean response 
function. This is currently the most widely used model for two reasons: (1) we shall 
see that the regression parameters have relatively simple and useful interpretations, and 
(2) statistical software is widely available for analysis of logistic regression models. In the 
next two sections we consider in detail the fitting of simple and multiple logistic regression 
models to binary data. f 


Comment i 

Our development of the logistic and probit mean response functipns assumed that the binary response 

Y; was obtained from an explicit dichotomization of an observed continuous response Yf, but this is 

not required. These response functions often work well for binary responses that do not arise from 

such a dichotomization. In addition, binary responses frequently can be interpreted as having arisen 
‚и from a dichotomization of an unobserved, or latent, continuous response. E 


14.3 Simple Logistic Regression 


We shall use the method of maximum likelihood to estimate the parameters of the logistic 
response function. This method is well suited to deal with the problems associated with the 
responses Y; being binary. As explained in Section 1.8, we first need to-develop the joint 
probability function of the sample observations. Instead of using the normal distribution 
for the Y observations as was done earlier in (1.26), we now need to utilize the Bernoulli 
distribution for a binary random variable. is 


Simple Logistic Regression Model 
First, we require a formal statement of tbe simple logistic regression model. Recall that 
when the response variable is binary, taking on the values 1 and 0 with probabilities л and 
1 — л, respectively, Y is a Bernoulli random variable with parameter E{Y} = л. We could 
state the simple logistic regression model in the usual form: 


Y; = E(Yi] + & 
Since the distribution of the error term ¢; depends on the Bernoulli distribution of the 
response Y;, it is preferable to state the simple logistic regression model in the following 
fashion: 


Y; are independent Bernoulli random variables with expected 


values E(Y;) = л;, where: 
. (14.20) 
ехр(Во + BiX;) 


1 +-exp(6o + £X) 
The X observations are assumed to be known constants. Alternatively, if the X observations 
are random, E {Y;} is viewed as a conditional mean, given the value of X;. 


E{Y;} = m; = 


564 PartThree Nonlinear Regression 


Likelihood Function 
Since each Y; observation is an ordinary Bernoulli random variable, where: 


P(Y; =1)=7; 
P(Y; = 0) = 1— л; 
we can represent its probability distribution as follows: 
füg-n*ü-m)* ¥,=0,1; i=], (14.21) 


Note that f;(1) = 7; and f;(0) = 1 — лг. Hence, f; (Yi) simply represents the probability 
that Y; = 1 or 0. y 
Since the Y; observations are independent, their joint probability function is: 


gi... Yos [A = ао m" (14.22) 
i=} ixl 


Again, it will be easier to find the maximum likelihood estimates by working with the 
logarithm of the joint probability function: 


log, g(Yi, ..., Ya) = log, | | z a — n) 


i=) 


= Уи log, z; + (1 — ¥;) log, (1 — л) 


i=] 


=D (19. ~ =) [+X - л) (1423) 


Since E{Y;} = 7; for a binary variable, it follows from (14.16) that: 


]— т = [1 + ехр(% + В.Х)! (14.24) 
Furthermore, from (14.182), we obtain: 
л 
toe ) = Bot В.Х; (14.25) 
—л 


Hence, (14.23) can be expressed as follows: 


log, L(Ao. В) = У ¥i(Bo + В.Х) — У ор, + ехр(бо + Ху] (1426) 

i=l i= 
where L(£o, 6) replaces g(Y;,..., У„) to show explicitly that we now view this function 
as the likelihood function of the parameters to be estimated, given the sample observations. 


Maximum Likelihood Estimation 
The maximum likelihood estimates of f and £, in the simple logistic regression model 
are those values of Во and f, that maximize the log-likelihood function in (14.26). No 
closed-form solution exists for the values of Во and £; in (14.26) that maximize the log: 
likelihood function. Computer-intensive numerical search procedures are therefore require 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 565 


to find the maximum likelihood estimates bp and Ру. There are several widely used numerical 
search procedures; one of these employs iteratively reweighted least squares, which we shall 
explain in Section 14.4. Reference 14.1 provides a discussion of several numerical search 
procedures for finding maximum likelihood estimates. We shall rely on standard statistical 
software programs specifically designed for logistic regression to obtain the maximum 
likelihood estimates bo and b,. 

Once the maximum likelihood estimates Ро and P, are found, we substitute these values 
into the response function in (14.20) to obtain the fitted response function. We shall use #; 
to denote the fitted value for the ith case: 


a _ €exp(bo + bXi) 


i= 14.27 
1 + exp(bo + bX) © ‹ ) 

The fitted logistic response function is as follows: : 
PNE exp(bo + РХ) (14.28) 


1 + exp(bo + b1 X) 


If we utilize the logit transformation in (14.18), we can express the fitted response 
function in (14.28) as follows: 


ї' =bo +b, X (14.29) 
where: 


ft’ = log, (т^) (14.29а) 


We call (14.29) the fitted logit response function. 

Once the fitted logistic response function has been obtained, the usual next Steps are to 
examine the appropriateness of the fitted response function and, if the fit is good, to make a 
variety of inferences and predictions. We shall postpone a discussion of how to examine the 
goodness of fit of a logistic response function and how to make inferences and predictions 
until we have considered the multiple logistic regression model with a number of predictor 
variables. 


А systems analyst studied the effect of computer programming experience on ability to 
complete within a specified time a complex programming task, including debugging. 
Twenty-five persons were selected for the study. They had varying amounts of programming 
experience (measured in months of experience), as shown in Table 14.1a, column 1. АП 
persons were given the same programming task, and the results of their success in the task 
are shown in column 2. The results are coded in binary fashion: Y = 1 if the task was com- 
pleted successfully in the allotted time, and Y = 0 if the task was not completed successfully. 
Figure 14.5 contains a scatter plot of the data. This plot is not too informative because of the 
nature of the response variable, other than to indicate that ability to complete the task suc- 
cessfully appears to increase with amount of experience. A lowess nonparametric response 
curve was fitted to the data and is also shown in Figure 14.5. A sigmoidal S-shaped response 
function is clearly suggested by the nonparametric lowess fit. It was therefore decided to fit 
the logistic regression model (14.20). * 

A standard logistic regression package was run on the data. The results are contained 
in Table 14.1b. Since bọ = —3.0597 and bı = .1615, the estimated logistic regression 


566 Part Three Nonlinear Regression 


TABLE 14.1 

Data and asa i d loo DEM. S. rte НЧИ 
Maximum (1) (2) (3). 
Likelihood Months of Task Fitted 
Estimates— Person Experience Success Valüe 

i Programming i Xi Yi ft; 

{ Task Example. 1 14. 0 310 
iod 2 29 0 .835 
КР. 3 6 0 110, 
ОҢ ddl 23 28 1 ‚812 

| cae 24 22 1 621 

| doo 3b 25 8 1 .146 

| | b (b) Maximum Likelihood Estimates 
dE | Estimated Estimated: _ 
E |. Кедгеѕѕіоп Regression Standard 
i Coefficient Coefficient Deviation 
Bo —3.0597 1.259 
» 4 i n 1615 0650 
x i, 
[d 
ri 
: t FIGURE 14.5 
| | Scatter Plot, 
i | Lowess Curve 
"s | (dashed line), x 
| ijs and Estimated £ 
E Logistic Mean m 
‚|! Response © 
| Function 2 
F (solid line)— 
| Programming 
| Task Example. 
| 0 10 20 30 
i Months of Experience (X) 
u | | function (14.28) is: 
PO gy z x 2x 3.0597 + 1 
Lg i ^ 1+ exp(—3.0597 + 
à The fitted values are given in Table 14.1a, column 
: { f response for i = 1, where X, = 14, is: 
A exp[— 3.0597 + .1615 


ft 


= 14 exp[—3.0597 + 16 


cogs 


’ фріе 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 567 


This fitted value is the estimated probability that a person with 14 months experience will 
successfully complete the programming task. In addition to the lowess fit, Figure 14.5 also 
contains а plot of the fitted logistic response function, 2 (x). 


interpretation of b; 


The interpretation of the estimated regression coefficient b, in the fitted logistic response 
function (14.30) is not the straightforward interpretation of the slope in a linear regression 
model. The reason is that the effect of a unit increase in X varies for the logistic regression 
model according to the location of the starting point on the X scale. An interpretation of b, 
is found in the property of the fitted logistic function that the estimated odds 7 /(1 — 7) are 
multiplied by exp(,) for any unit increase in X. 

То see this, we consider the value of the fitted logit response function (14.29) at X = X: 


#(Х) = bo - biX; $ 
The notation 7'(Х ;) indicates specifically the X level“associated with the fitted value. We 
also consider the value of the fitted logit response function at X = X; + 1: 


f'(X; +1) = bot b; +) 
The difference between the two fitted values is simply: 
R(X; +1)- £'(Xj) = В, 


Now according to (14.29a), #’(X ;) is the logarithm of the estimated odds when X = Xj; 
we shall denote it by log, (odds;). Similarly, 7/(Х ; + 1) is the logarithm of the estimated 
odds when X = X; + 1; we shall denote it by log, (ода). Hence, the difference between 
the two fitted logit response values can be expressed as follows: 


odds; 
log, (odds?) — log, (odds,) = les (5 dd; j- = b 


Taking antilogs of each side, we see that the estimated ratio of the odds, called the odds 
ratio and denoted by OR, equals exp(b,): 


Онд) (14.31) 
о 


For the programming task example, we see from Figure 14.5 that the probability of success 
increases sharply with experience. Specifically, Table 14.1b shows that the odds ratio is 
OR = exp(b,) = exp(.1615) = 1.175, so that the odds of completing the task increase by 
17.5 percent with each additional month of experience. 

Sinceaunitincrease of one month is quite small, the estimated odds ratio of 1.175 may not 
adequately show the change in odds for a longer difference in time. In general, the estimated 
odds ratio when there is a difference of c units of X is exp(cb,). For example, should we wish 
to compare individuals with relatively little experience to those with extensive experience, 
say 10 months versus 25 months so that c — 15, then the odds ratio would be estimated 
to be exp[15(.1615)] — 11.3. This indicates that the odds of completing the task increase 
over 11-fold for experienced persons compared to relatively inexperienced persons. 


568 Part Three 


FIGURE 14.6 
Logistic (solid 
Iine), Probit 
(dashed line), 
and Comple- 
mentary 
Log-Log (gray 
line) Fits— 
Programming 
Task Example. 


Nonlinear Regression 


1.0 2 ә озо 9 6 9 06 


Фф 
3 
ud 
> 
$05 
Ф 
E 
LL. 

0.0 

0 
Experience (X) 

Comment 
The odds ratio interpretation of the estimated regression coefficient ру makes thc logistic regression 
model especially attractive for modeling and interpreting epidemiologic studies. н 


Use of Probit and Complementary Log-Log Response Functions 


As we discussed earlier in Section 14.2, alternative sigmoidal shaped response functions, 
such as the probit or complementary log-log functions, can be utilized as well. For example, 
it is interesting to fit the programming task data in Table 14.1 to these alternative response 
functions. Figure 14.6 shows the scatter plot of the data and the fitted logistic, probit, 
and complementary log-log mean response functions. The logistic and probit fits are very 
similar, whereas the complementary log-log fit differs slightly. having a less pronounced 
S-shape. 


Repeat Observations—Binomial Outcomes 


In some cases, particularly for designed experiments, a number of repeat observations are 
obtained at several levels of the predictor variable X. For instance. a pricing experiment 
involved showing a new product to 1,000 consumers, providing information about it, and 
then asking each consumer whether he or she would buy the product at a given price. 
Five prices were studied, and 200 persons were randomly selected for each price tevel. 
The response variable here is binary (would purchase, would not purchase); the predictor 
variable is price and has five levels. 

When repeat observations are present, the log-likelihood function in (14.26) can be 
simplified. We shali adopt the notation used for replicate observations in our discussion of 
the F test for lack of fit in Section 3.7. We denote the X levels at which repeat observations atè 
obtained by X,.. .. X, and we assume that there are л ; binary responses at level X ;. Then 
the observed value of the ith binary response at X ; is denoted by Y;;, where i = 1, .. -7j 
and j = 1,..., c. The number of 1s at level X, is denoted by Yj: 


"n, 


Y;-— У Y;, 
i=! 


(14.322) 


й 
Ехатріе 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 569 


and the proportion of 1s at level X; is denoted by py: 


Y: 
p; = — (14.32b) 


nj 


The random variable У ; has a binomial distribution given by: 


= o x" d лу! (1433) 
Ј 


where: 


. 


Goio 
yy (У j)!(n; — Y)! 


k 
and the factorial notation a! represents a (a — 1)(a — 2) - - - 1. The binomial random variable 
Y; bas mean n;z; and variance n;z;(1 — лу). The log-likelihood function then can be 
stated as follows: 


log, L(fo, В) = Y { lo ( 7 ) + Y.j(Bo + В.Ху) — nj log,[1 + ехр(бо + 22] 
J 


je 


(14.34) 


In a study of the effectiveness of coupons offering a price reduction on a given product, 
1,000 homes were selected at random. A packet containing advertising material and a 
coupon for the product were mailed to each home. The coupons offered different price 
reductions (5, 10, 15, 20, and 30 dollars), and 200 homes were assigned at random to each 
of the price reduction categories. The predictor variable X in this study is the amount of 
price reduction, and the response variable Y js a binary variable indicating whether or not 
the coupon was redeemed within a six-month period. 

Table 14.2 contains the data for this study. X; denotes the price reduction offered by 
a coupon, n; the number of households that received a coupon with price reduction X ;, 
Y.; the. number of these households that redeemed the coupon, and p; the proportion of 
households receiving a coupon with price reduction X; that redeemed the coupon. The 
logistic regression model (14.20) was fitted by a logistic regression package and the fitted 


1) (2 з Ф (5). 
Number.of ^ Proportion of Model- 
Price Number of- Coupons Coupons ‘Based 
Level Reduction Households Redeemed Redeemed Estimate 
j X n ЫЛ Pi îy 
5 200 . 30 . 4150 4736. 
2 10 200 55 275 «2543 
3 15 200 > 70 .350 .3562 
4 20 200 100 500 4731 
5 30 200 “137 +685 .7028 


570 Part Three 


FIGURE 14.7 
Plot of 
Proportions 

of Coupons 
Redeemed and 
Fitted Logistic 
Response 
Function— 
Coupon 
Effectiveness 
Example. 


Nonlinear Regression 


1.0 
0.8 
"o 
o 
E C 
d 
o 0.6 
G 
e Э 
с 
© 
E 0.4 
Q 
О 
a 
0.2 
e 
0.0 L 1 4 2] 
0 10 20 30 40 


Price Reduction ($) 


response function was found to be: 


exp(—2.04435 + .096834X) 
~ 1 +exp(—2.04435 + .096834X) 


(14.35) 


Fitted values are given in column 5 of Table 14.2. Figure 14.7 shows the fitted response 
function, as well as the proportions of coupons redeemed at each of the X, levels. The 
logistic response function appears to provide a very good fit. The odds ratio here is: 


OR = exp(bi) = exp(.096834) = 1.102 


Hence, the odds of a coupon being redeemed are estimated to increase by 10.2 percent with 
each one dollar increase in the coupon value, that is, with each one dollar reduction in price. 


14.4 Multiple Logistie Regression 


Multiple Logistic Regression Model 


The simple logistic regression model (14.20) is easily extended to more than one predictor 
variable. In fact, several predictor variables are usually required with logistic regression to 
obtain adequate description and useful predictions. 

In extending the simple logistic regression model, we simply replace By + £i X in (14.16) 
by o + £i Xi +--- + Bp-1Xp—1. To simplify the formulas, we shall use matrix notation 
and the following three vectors: 


I 1 
A Xi Xn 
Вб =|. х = | Х X, =| Xn (14.36) 
pxi : pxl : pxi : 
Bis Х р-а Xi. p-1 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 571 


We then have: 
X'B = Bo + ВХ +++- + Bp-1Xp-1 (14.37а) 
Xif = Bo + В. Ха +: + Bp-1Xi,p-1 (14.37b) 


With this notation, the simple logistic response function (14.20) extends to the multiple 
logistic response function as follows: 


and the equivalent simple logistic response form (14.17) extends to: 
Е(Ү}= 0 + exp XPI * (14.38а) 
Similarly, the logit transformation (14.182): | 


л'= log, ( ; = ) (14.39) 


л 


(14.38) 


now leads to the logit response function, or linear predictor: 
E x’ =X (14.40) 
The multiple logistic regression model can therefore be stated as follows: 


Y; are independent Bernoulli random variables with expected 
values E{Y;} = л;, where: 
14.41 
exp(X;B) ( ) 
1 + exp(X/B) e 5 


Again, the X observations аге considered to be known constants. Alternatively, ifthe X vari- 
ables are random, E(Y;) is viewed as a conditional mean, given the values of Xz, ..., Xi,p—1- 

Like the simple logistic response function (14.16), the multiple logistic response func- 
tion (14.41) is monotonic and sigmoidal in shape with respect to X’B and is almost linear 
when z is between .2 and .8. The X variables may be different predictor variables, or 
some may represent curvature and/or interaction effects. Also, the predictor variables may 
be quantitative, or they may be qualitative and represented by indicator variables. This 
flexibility makes the multiple logistic regression model very attractive. 


E(Y;) = m = 


aa Comment 


When the logistic regression model contains only qualitative variables, it is offen referred to as 
a log-linear model. See Reference 14.2 for an in-depth discussion of the analysis of log-linear 


models. E 

tting of Model * 
‘ Again, we shall utilize the method of maximum likelihood to estimate the parameters of the 
multiple logistic response function (14.41). The log-likelihood function for simple logistic 


regression in (14.26) extends directly for multiple logistic regression: 


| log, L(B) = Y;(X;B) — Уу юв, + exp(X;f)] (14.42) 


i=l ist 


572 Part Three Nonlinear Regression 


FIGURE 14.8 
Three- 
Dimensional 
Fitted Logistic 
Response 
Surface— 
Coronary 
Heart Disease 
Example. 


Numerical search procedures are used to find the values of fg. £1. .--. Bp-ı that maximize 
log, L(B). These maximum likelihood estimates will be denoted by by, bi. .. .. bp-i. Leth 
denote the vector of the maximum likelihood estimates: 
by 
bi 
px! 2 
bp-i 


The fitted logistic response function and fitted values can then be expressed as follows: 


exp(X'b) TESI 
p elites Id —X'b Е 
| F exp(X’b) [1 + ехр( )l (14.443 

. exp(X;b) ed 

;, = —————— = | xp( —X;b 
AS EDU P re (14.44) 
where: А 

X'b = bo +b:Xı +: 6-Х (14.440) 
Xjb = bo + biXi +: + bp-i Xi. р (14.44d) 


Geometric interpretation. Recall that when fitting a standard multiple regression model 
with two predictors, the estimated regression surface is a plane in three-dimensional space, 
as shown in Figure 6.7 on page 240 for the Dwaine Studios example. A multiple logistic 
regression fit based on two continuous predictors can also be represented by a surface in 
three-dimensional space, but the surface follows the characteristic S-shape that we saw 
for simple logistic models. For example, Figure 14.8 displays a three-dimensional plot of a 
logistic response function that depicts the relationship between the development of coronary 
disease (Y , the binary outcome) and two continuous predictors, cholesterol level (X,) and 
age (Хэ). This surface increases in an approximately linear fashion for larger values of 


Probability 


А7 


ZA, 
ALLAL AALALA 
LARL ALIA 


ano EE 
Example 
FAUX 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 573 


cholesterol level and age, but levels off and is nearly horizontal for small values of these 
predictors. 

We shall rely on standard statistical packages for logistic regression to conduct the 
numerical search procedures for obtaining the maximum likelihood estimates. We therefore 
proceed directly to an example to illustrate the fitting and interpretation of a multiple logistic 
regression model. 


Inahealth study to investigate an epidemic outbreak ofa disease that is spread by mosquitoes, 
individuals were randomly sampled within two sectors in a city to determine if the person 
had recently contracted the disease under study. This was ascertained by the interviewer, 
who asked pertinent questions to assess whether certain specific symptoms associatéd with 
the disease were present during the specified period. The response variable Y was coded 1 
if this disease was determined to have been present, and 0 if not. 

Three predictor variables were included in the study, representing known or potential 
risk factors. They are age, socioeconomic status of household, and sector within city. Age 
(Xi) is a quantitative variable. Socioeconomic status is a categorical variable with three 
levels. It is represented by two indicator variables (X2 and X3), as follows: 


Ed 


Class X2 Хз 
Upper 0 0 
Middle 1 0 
Lower 0 1 


City sector is also a categorical variable. Since there were only two sectors in the study, 
one indicator variable (X4) was used, defined so that X, = 0 for sector 1 and X4 = 1 for 
sector 2. 

The reason why the upper socioeconomic class was chosen as the reference class 
(i.e., the class for which the indicator variables X? and Хз are coded 0) is that it was expected 
that this class would have the lowest disease rate among the socioeconomic classes. By mak- 
ing this class the reference class, the odds ratios associated with regression coefficients 82 
and £4 would then be expected to be greater than 1, facilitating their interpretation. For the 
same reason, sector 1, where the epidemic was less severe, was chosen as the reference 
class for the sector indicator variable X4. 

The data for 196 individuals in the sample are given in the disease outbreak data set in 
Appendix C.10. The first 98 cases were selected for fitting the model. The remaining 98 
cases were saved to serve as a validation data set. Table 14.3 in columns 1—5 contains the 
data for a portion of the 98 cases used for fitting the model. Note the use of the indicator 
variables as just explained for the two categorical variables. The primary purpose of the 
study was to assess the strength of the association between each of the predictor variables 
and the probability of a person having contracted the disease. = 

А first-order multiple logistic regression model with the three predictor variables was 
considered a priori to be reasonable: d 


E(Y] = [1 - exp XT" (14.45) 


574 Part Three Nonlinear Regression 


TABLE 14.3 
Portion of 
Model- 
Building Data 
Set—Disease 
Outbreak 
Example. 


TABLE 14.4 
Maximum 
Likelihood 
Estimates 
of Logistic 
Regression 
Function 
(14.45) — 
Disease 
Outbreak 
Example. 


(0 о G) (5 (5) (6) 


Soave cone City Disease Fitted 
Case Age Status Sector Status Value 
i Xn Xn Хз Хм Y; fi 

1 33 0 0 0 0 .209 

(Coded) 2 35 0 0 0 0 .219 
3 6 0 0 0 0 106 

4 60 0 0 0 0 .371 

5 18 0 1 0 1 111 

6 26 0 1 0 0 136 

98 35 0 1 0 0 171 


(a) Estimated Coefficients, Standard Deviations, and Odds Ratios 


Estimated Estimated 
Regression Regression Standard Estimated 
Coefficient Coefficient Deviation Odds Ratio 
Bo —3.8877 9955 — 
Ё .02975 -01350 1.030 
£2 .4088 .5990 1.505 
Ps —.30525 .6041 .737 
Ва 1.5747 .5016 4.829 


(b) Estimated Approximate Variance-Covariance Matrix 


bo bi b2 bs b, 
4129  —.0057 ~.1836 —2010 ~.1632 
—.0057 .00018 00115 .00073 .00034 
sb) = |—.1836 00115 .3588 1482 0129 
—.2010 00073 1482 .3650 .0623 
—1632 .00034 .0129 0623 2516 
where: 
X'B = fo + ВХ, + foXo + faXa + fa Xa (14.452) 


This model was fitted by the method of maximum likelihood to the data for the 98 cases. 
The results are summarized in Table 14.4a. The estimated logistic response function is: 


f = [1 + exp(3.8877 — .02975Х, — 4088X2 + .30525X3 — 1.5747X,]' (1446) 


The interpretation of the estimated regression coefficients in the fitted first-order multiple 
logistic response function parallels that for the simple logistic response function: exp 
is the estimated odds ratio for predictor variable X;. The only difference in interpretation 
for multiple logistic regression is that the estimated odds ratio for predictor variable X 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 575 


assumes that all other predictor variables are held constant. The levels at which they are 
held constant does not matter in a first-order model. We see from Table 14.4a, for instance, 
that the odds of a person having contracted the disease increase by about 3.0 percent with 
each additional year of age (X4), for given socioeconomic status and city sector location. 
Also, the odds of a person in sector 2 (X4) having contracted the disease are almost five 
times as great as for a person in sector 1, for given age and socioeconomic status. These are 
point estimates, to be sure, and we shall need to consider how precise these estimates are. 

Table 14.3, column 6, contains the fitted values 7;. These are calculated as usual. For 
instance, the estimated mean response for case i = 1, where X4; = 33, Хо = 0, Xp = 0, 
X14 = 0, is: 


ft, = {1 + ехр[2.3129 — .02975(33) — .4088(0) + .30525(0) — 1.5747(0)]]! = .209 


Н 


Polynomial Logistic Regression 
Occasionally, the first-order logistic model may not provide an adequate fit to the data and 
a more complicated model may be needed. One such model is the kth-order polynomial 
logistic regression model, with logit response function: 


z'(x) = Bo + Bux + Вох? +--+ + Biex" (14.47) 


where x denotes the centered predictor, X — X. This model for the logit is still linear in the 
P parameters. For simplicity, we will use a second-order polynomial: 


л'/(х) = Bo + Bux + Вох? 
to demonstrate the procedure. 


A study of 482 initial public offering companies (IPOs) was conducted to determine the 
characteristics of companies that attract venture capital. Here, the response of interest 
is whether or not the company was financed by venture capital funds. Several potential 
predictors are: the face value of the company; the number of shares offered; and whether 
or not the company was a leveraged buyout. The IPO data set is listed in Appendix C.11. 
In this example we consider just one predictor, the face value of the company. 

Figure 14.9a contains a plot of venture capital involvement (Y) versus the the natu- 
ral logarithm of the face value of the company (X) with a lowess smooth and the fitted 


FIGURE 14.9 (a) First-Order Fit (b) Second-Order Fit 
"iist- and 


Probability 
Probability 


Logarithm of Face Value Logarithm of Face Value 


576 Part Three Nonlinear Regression 


TABLE 14.5 
Logistic 
Regression 
Output for 
Second-Order 
Model—IPO 
Example. 


Predictor 


Constant 
X 
x 


Estimated 
Coefficient 


bo = 0.3005 


by = 0.5516 
b22 = —0.8615 


Estimated 
Standard Error 


0.1240 
0.1385 
0.1404 


Zz 
2.42 


3.98 
—6.14 


P-value 
0.015 


0.000 
0.000 


first-order logistic regression fit superimposed. (Here we chose to analyze the natural loga- 
rithm of face value because face value ranges over several orders of magnitude, with a highly 
skewed distribution.) The lowess smooth clearly suggests a mound-shaped relationship: for 
small and large companies, the likelihood of venture capital involvement is near zero, but for 
midsized companies it is over .5. The first-order logistic regression fit is unable to capture 
the characteristic mound shape of the mean response function and is clearly inadequate, 


Table 14.5 shows the fitted second-order response function: 


f' = .3005 + .5516х — .8615x? 


where x — X — X. Also shown in Table 14.5 are three quantities to be discussed in Sec- 
tion 14.5, namely, the estimated standard error of each coefficient, a statistic, <“, for testing 
the hypothesis that the coefficient is zero, and the resulting P-value. We simply note fornow 
that the P-value for 5» is .000, confirming the need for a second-order term. Figure 14.96 
plots the data, the lowess smooth, and the second-order polynomial logistic regression fit. 
Note that the second-order polynomial fit tracks the lowess smooth closely. 

The above example demonstrated the use of polynomial regression for a single predictor. 
For multiple logistic regression. higher order polynomial terms and cross-products may be 
added to improve the fit of a model, as discussed in Section 8.1 in the context of multiple 
linear regression models. 


Comments 


+. The maximum likelihood estimates of the parameters f) for the logistic regression model can 
be obtained by iteratively reweighted least squares. The procedure is straightforward, although it 


involves intensive use of a computer. 


a. Obtain starting values for the regression parameters, to be denoted by b(O). Often. reasonable 
starting values can be obtained by ordinary least squares regression of Y on the predictor variables ; 
Xs cecus X 1. using a first-order linear model. 


b. Using these starting values. obtain: 


#|(0) = X; [bc] 


exp[7; (0)] 


c. Calculate the new response variable: 


and the weights: 


Ү;(0) = f;(0) + = 


v; (0) = Ê; (OLE — 2,(0)) 


| + expo; (0)] 


Y; — £,(0) 
F (OL — 2:001 


(14.483): 


(14.48b) * 


(14.490).. 


Z 


(144% ` 


СЕ 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 577 


d. Regress Y'(0) in (14.492) on the predictor variables X,,..., X, , using a first-order linear 
model with weights in (14.49b) to obtain revised estimated regression coefficients, denoted by b(1). 

e. Repeat steps b through d, making revisions in (14.48) and (14.49) by using the latest revised 
estimated regression coefficients until there is little if any change in the estimated coefficients. Often 
three or four iterations are sufficient to obtain convergence. 


2. When the multiple logistic regression model is not a first-order model and contains quadratic 
or higher-power terms for the predictor variables and/or cross-product terms for interaction effects, 
the estimated regression coefficients b; no longer have a simple interpretation. 

3. When the assumptions of a monotonic sigmoida] relation between z and Х'В, required for 
the multiple Jogistic regression model, are not appropriate, an alternative is to convert all predictor 
variables to categorical variables and employ a log-linear model. Ín the disease outbreak example, 
for instance, age could be converted into a categorical variable with three classes 0—18, 19—50, and 
51—75. Reference 14.2 describes the use of log-linear models for binary response variables when the 
predictor variables are categorical. 

4. Convergence difficulties in the numerical search procedures for finding the maximum Jikelihood 
estimates of the multiple logistic regression function may be encountered when the predictor variables 
are highly correlated or when there is a large number of predictor variables. Another instance that 
causes convergence problems occurs when a collection of the predictors either completely or nearly 
perfectly separates the outcome groups. Indication of this problem often can be detected by noting large 
estimated parameters and large estimated standard errors, similar to what occurs with multicollinearity 
problems. When convergence problems occur, it may be necessary to reduce the number of predictor 
variables in order to obtain convergence. ш 


145 Inferences about Regression Parameters 


The same types of inferences are of interest in logistic regression as for linear regression 
models—inferences about the regression coefficients, estimation of mean responses, and 
predictions of new observations. 

The inference procedures that we shall present rely on large sample sizes. For large sam- 
ples, under generally applicable conditions, maximum likelihood estimators for logistic 
regression are approximately normally distributed, with little or no bias, and with approxi- 
mate variances and covariances that are functions of the second-order partial derivatives of 
the logarithm of the likelihood function. 

Specifically, let G denote the matrix of second-order partial derivatives of the log- 


ES likelihood function in (14.42), the derivatives being taken with regard to the parameters 
^ Во, Bi. ---› Bp: 
К С -[g] i-01L...p-Lhj-01...p-1 (14.50) 
Н рхр эк 
i where: 
" a? log, L(f) : 
0077 T — 
двд 
3? log. LB) 
; RU ecce 
1 9899, 


etc. 


578 Part Three Nonlinear Regression 


This matrix is catled the Hessian matrix. When the second-order partial derivatives in the 
Hessian matrix are evaluated at В = b, that is, at the maximum fikefihood estimates, the 
estimated approximate variance-covariance matrix of the estimated regression coefficients 
for logistic regression can be obtained as follows: 


s'(b) = ( —Sgijlg-b) (14.51) 


The estimated approximate variances and covariances in (14.51) are routinely provided by 
most logistic regression computer packages. 

Inferences about the regression coefficients for the simple logistic regression model 
(14.20) or the muttiple logistic regression model (14.41) are based on the following approx- 
imate result when the sample size is large: 


bh-&5 
sibi] 


where z is a standard normal random variable and s{b,} is the estimated approximate 
standard deviation of b, obtained from (14.51). 


k 20,1... p—1 (14.52) 


Test Concerning a Single £,: Wald Test 


Example 


A large-sample test of a single regression parameter can be constructed based on (14.52). 
For the alternatives: 


Hy: fy = 0 
14.53a 
Ha: Ве + 0 ( 
ап appropriate test statistic is: 
bi 
ie 14.53b 
sibi) ( ) 
and the decision rule is: 
If |z*| < (1 — 0/2), conclude Hy (14.53) 


If |z*] > z(1 — o/2). conclude Hy 


One-sided alternatives will involve a one-sided decision rule. The testing procedure in 
(14.53) is commonly referred to as the Wald test. On occasion, the square of z* is used 
instead, and the test is then based on a chi-square distribution with 1 degree of freedom. 
This is also referred to as the Wald test. 


Inthe programming task example, 8, was expected to be positive. The alternatives of interest 
therefore are: 


Ho: fi < 0 
H,: В. > 0 
Test statistic (14.536), using the results in Table 14.16, is: 


_ 1615 
~ 10650 


Li 


7 


= 2.485 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 579 


For о = .05, we require z(.95) = 1.645. The decision rule therefore is: 

If z* « 1.645, conclude Hp 

If z* > 1.645, conclude H, 
Since z* = 2.485 > 1.645, we conclude H,, that f, is positive, as expected. The one-sided 
P-value of this test is .0065. 


Interval Estimation of a Single £x 
From (14.52), we obtain directly the approximate 1 — о confidence limits for к: 
by x z — o/2)s(bx] (14.54) 


where z(1 — 0/2) is the (1 — &/2)100 percentile of the standard normal distribution. 
The corresponding confidence limits for the odds ratio exp(6;) are: ’ 


exp[Dy + z(1 — «/2)5{Рк}] (14.55) 


For the programming task example, it is desired to estimate Ву with an approximate 
95 percent confidence’ interval. We require z(.975) = 1.960, as well as the estimates 
bı = .1615 and s{b,} = .0650 which are given in Table 14.1b. Hence, the confidence limits 
are .1615 + 1.960(.0650), and the approximate 95 percent confidence interval for f, is: 


-0341 < f, x .2889 


d eon ccs 
Example 
——M 


Thus, we can conclude with approximately 95 percent confidence that B, is between 
.0341 and .2889. The corresponding 95 percent confidence limits for the odds ratio are 
exp(.0341) — 1.03 and exp(.2889) — 1.33. 

To examine whether the large-sample inference procedures are applicable here when 
n == 25, bootstrap sampling can be employed, as described in Chapter 13. Alternatively, 
estimation procedures have been developed for logistic regression that do not depend on any 
large-sample approximations. LogXact (Reference 14.3) was run on the data and produced 
95 percent confidence limits for Ву of .041 and .296. The large-sample limits of .034 and 
.289 are reasonably close to the LogXact limits, confirming the applicability of large-sample 
theory here. 

If we wish to consider the odds ratio for persons whose experience differs by, say, five 

5 months, the point estimate of this odds ratio would be exp(5h1) == ехр[5(.1615)] = 2.242, 

and the 95 percent confidence limits would be obtained from the confidence limits for Р, 
as follows: exp[5(.0341)] = 1.186 and exp[5(.2889)] = 4.240. Thus, with 95 percent confi- 
dence we estimate that the odds of success increase by between 19 percent and 324 percent 
with an additional five months of experience. 2 


Comments 


1. If the large-sample conditions for inferences are not met, the bootstrap procedure can be em- 
ployed to obtain confidence limits for the regression coefficients. The bootstrap here requires gen- 
erating Bernoulli random variables as discussed in Section 14.8 for the construction of simulated 
envelopes. 

Р 2. We аге using the z approximation here for large-sample inferences rather than the f approxima- 
5 tion used in Chapter 13 for nonlinear regression. This choice is conventional for logistic regression. 


580 PartThree Nonlinear Regression 


For targe sample sizes, there is little difference between the f distribution and the standard norma] 
distribution. 

3. Approximate joint confidence intervals for several logistic regression parameters can be de. 
veloped by the Bonferroni procedure. If g paramcters are to be estimated with family confidence 
coefficient of approximately | — о, the joint Bonferroni confidence limits are: 


b, + Bs(bil (14.56) 
where: 


B =z(1 ~ 0/28) (14.563) 


4. For power and sample size considerations in logistic regression modeling, see Reference {4.4 
и 


Test whether Several f), = 0: Likelihood Ratio Test 
Frequently there is interest in determining whether a subset of the X var tables in a multiple 
logistic regression mode! can be dropped, that is, in testing whether the associated regression 
coefficients В, equal zero. The test procedure we shall employ is a general one for use with 
maximum likelihood estimation, and is analogous to the general linear test procedure for 
linear models. The test is called the likelihood ratio test, and, like the general linear test, is 
based on a comparison of full and reduced models. The test is valid for large sample sizes, 
We begin with the full logistic model with response function: 


л = |1 +exp(—X’B,) {7 Fult model (14.57) 
where: 
ХВ. = By + Ху E Bp-i X pr 


We then find the maximum likelihood estimates for the full model, now denoted by br, 
and evaluate the likelihood function L(B) when B, = bpr. We shall denote this value of the 
likelihood function for the full model by L(F). 

The hypothesis we wish to test is: 


Hy: В, == Bars SS R Bp-1 =0 


: (14.58) 
Не: not all of the £, in Hy equal zero 


where, for convenience, we arrange the model so that the last p — q coefficients are those 
tested. The reduced logistic model therefore has the response function: 


л =[1-„ехр(—Х'В„)]`! Reduced model (14.59) 


where: 
X'By = Bot EiXi + Pya Ха-а 


Now we obtain the maximum likelihood estimates bp for the reduced modet and evaluate 
the fikelihood function for the reduced model containing д parameters when fa =bg; 
We shall denote this value of the likelihood function for the reduced model by L(R). It can 
be shown that L(R) cannot exceed L(F) since one cannot obtain a larger maximum for 
likelihood function using a subset of the parameters. 


эж 


оло = 
Example 


m 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 581 


The actual test statistic for the likelihood ratio test, denoted by G?, is: 


L(R) 


G? — —21og, Бе 


| = —2[log, L(R) — log, L(F)] (14.60) 
Note that if the ratio L(RYL(F) is small, indicating Н, is the appropriate conclusion, then 
G? is large. Thus, large values of G? lead to conclusion H,. 

Large-sample theory states that when n is large, G? is distributed approximately as 
X?(p — q) when Hp in (14.58) holds. The degrees of freedom correspond to dfr — dfr = 
(n — q) — (n — p) = p — q. The appropriate decision rule therefore is: 


If G? < x?(1 — o; p — q), conclude Ho 


14.61 
If С? > x?(1— o5; p — q), conclude H, ( ) 


In the disease outbreak example, the model building began with the three predictor variables 
that were considered a priori to be key explanatory variables—age, socioeconomic status, 
and city sector. А logistic regression model was fitted containing these three predictor 
variables and the log-likelihood for this model was obtained. Then tests were conducted to 
see whether a variable could be dropped from the model. First, age (X1) was dropped from 
the logistic model and the log-likelihood for this reduced model was obtained. The results 
were: 


L(F) = L(bo, b, bo, b3, b) = —50.527  L(R) = L(bo, bo, b3, Ра) = —53.102 
Hence the required test statistic is: 
G? = —log,L(R) — log, L(F)] = —2[—53.102 — (—50.527)] = 5.150 


For о = .05, we require x?(.95; 1) = 3.84. Hence to test Ho: В. =0, Ha: f + 0, the ap- 
propriate decision rule is: 


If G? < 3.84, conclude Hy 
_ If С? > 3.84, conclude H, 


Since С? = 5.15 > 3.84, we conclude H,, that X, should not be dropped from the model. 
The P-value of this test is .023. 

Similar tests for socioeconomic status (X2, X3) and city sector (X4) led to P-values of 
.55 and .001. The P-value for socioeconomic status suggests that it can be dropped from the 
model containing the other two predictor variables. However, since this variable was con- 
sidered a priori to be important, additional analyses were conducted. When socioeconomic 
status is the only predictor in Ше logistic regression model, the P-value for the test whether 
this predictor variable is helpful is .16, suggesting marginal importance for this variable. 
In addition, the estimated regression coefficients for age and city sector and their estimated 
standard deviations are not appreciably affected by whether or not socioeconomic Status is 
in the regression model. Hence, it was decided to keep socioeconomic status in the logistic 
regression model in view of its a priori importance. 

The next question of concern was whether any two-factor interaction terms are required 
in the model. The full model now includes all possible two-factor interactions, in addition 


582 Part Three Nonlinear Regression 


to the main effects, so that Х' . for this model is as follows: 


X’B,- = Bo + Pi Xi + BoXo + BsXs + Ba Xa + ВХ Xo + ВХ, Хз 
+В: Х.Х + fia Хэ X4 + P» X3 X4 Ful] model 


We wish to test: 


Ho: Bs = Bo = Вт = fs = Во = 0 


На not all fy in Hg equal zero 


so that X’B, for the reduced model is: 
ХВ, = fo + В.Х, fa Xo + B3X3 + BaX4 Reduced mode] 
A computer run of a multiple logistic regression package yielded: 


L(F) — —46.998 . 
L(R) = —50.527 
G? = —2|log, (R) — log, (F)] = 7.058 


If Ho holds, G? follows approximately the chi-square distribution with 5 degrees of freedom. 
For a = .05, we require x?(.95; 5) = 11.07. Since С? = 7.058 < 11.07, we conclude Hy, 
that the two-factor interactions are not needed in the logistic regression model. The P-value 
of this test is .22. We note again that a logistic regression model without interaction terms 
is desirable, because otherwise ехр(В;) no longer can be interpreted as the odds ratio, 

Thus, the fitted logistic regression model (14.46) was accepted as the model to be checked 
diagnostically and, finally, to be validated. 


Comment 

"The Wald test for a single regression parameter in (14.53) is more versatile than the likelihood ratio 
test in (14.60). The latter can only be used to test Hy: В; = 0, whereas the former сап be used also for 
one-sided tests and for testing whether В; equals some specified value other than zero. When testing 
Hy: fy = 0, the two tests are not identical and may occasionally lead to different conclusions. For 
example, the Wald test P-value for dropping age when socioeconomic status and sector are їп the 
model for the disease data set example is 0275; the P-value for the likelihood ratio test 15.023. E 


14.6 Automatic Model Selection Methods 


Severa] automatic model selection methods are available for building logistic regression 
models. These include all-possible-regressions and stepwise procedures. We begin with а 
discussion of criteria for model selection. 


Model Selection Criteria 
In the context of multiple linear ET models, we discussed the use of the followin. 
model selection criteria in Chapter 9: Кз Кар» C», АІС, SBC p, and PRESS,. For logistiv 
regression modeling, the A/C, and SBC, criteria are easily adapted and are gene 
available in commercial саг: For these reasons we will focus on the use of these! , 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 583 


criteria. The modifications are as follows: 
AIC, = —21og, L(b) + 2p (14.62) 
SBC, = —21og, L(b) + plog,(n) (14.63) 


where log, (b) is the log-likelihood expression in (14.42). Promising models will yield 
relatively small values for these criteria. A third criterion that is frequently provided by 
software packages is —2 times the log-likelihood, or —2log, L(b). For this criterion, we 
also seek models giving small values. A drawback of this third criterion is that —21og, L(b) 
will never increase as terms are added to the model, because there is no penalty for adding 
predictors. This is analogous to the use of SSE, or R? in multiple linear regression. It is 
easily seen from (14.62), and (14.63) that AIC, and SBC, also involve —2 log, L(b), but 
penalties are added based on the number of terms p. This penalty is 2p for AIC, and 
plog,(n) for SBC,. 


E 


Best Subsets Procedures 


Example 


“Best” subsets procedures were discussed in Section 9.4 in the context of multiple linear 
regression. Recall that these procedures identify a group of subset models that give the 
best values of a specified criterion. As long as the number of parameters is not too large 
(typically less than 30 or 40) these procedures can be useful. As we noted in Section 9.4, 
time-saving algorithms have been developed that can identify the most promising models, 
without having to evaluate all 2?~' candidates. These procedures are similarly applicable in 
the context of logistic regression. We now illustrate the use of the the best subsets procedure 
based on the AIC, and SBC, criteria. 


For the disease outbreak example, there are four predictors, аре (X,), socioeconomic sta- 
tus (X? and Хз) and city sector (X4). Normally, it is advantageous to tie the two indica- 
tors for the qualitative predictor socioeconomic status together; that is, a model should 
either have both predictors, or neither. Since very few statistical software packages fol- 
low this convention, we will allow them to be independently included. This leads to the 
2* = 16 possible regression models listed in columns 2—5 of Table 14.6a. The AIC,, SBC,, 
and —21ор„ L(b) criterion values for each of the 16 models are listed in columns 6—8 of 
Table 14.6a and are plotted against p in Figures 14.10a—c, respectively. 

As shown in Figures 14.10a and 14.10b, both AIC, and SBC, are minimized for p — 3. 
Inspection of Table 14.6b reveals that the best two-predictor model for both criteria is based 
on X, (age) and X, (city sector). Other models that appear promising on the basis of the 
AIC, criterion are the three-predictor subsets based on Ху, X2, and X4 and X,, Хз, and X;, 
and the full model based on all four predictors. SBC, also identifies the two three-predictor 
subset models just noted, as well as the one-predictor model based on Х„. The tendency of 
SBC, to favor smaller models is evident in this example. 

The plot of —2log, L(b) in Figure 14.10c also points to a two- or three-predictor subset. 
The additional reduction in —21og, L(b) from moving from the best two:predictor model 
to the best three-predictor model are small, and the returns continue to diminish as we move 
from three predictors to the full, four-predictor model. 


Wise Model Selection 


As we noted in Chapter 9 in the context of model selection for multiple linear regression, 
when the number of predictors is large (i.e., 40 or more) the use of all-possible-regression 


584 Part Three Nonlinear Regression 


TABLE 14.6 Best Subsets Results—Disease Outbreak Example. 


(a) Results for All Possible Models (X;; = 1 if X; in model i; Х = 0 otherwise) 


Model 


—— —À — l2 = 
Ov Ca I» UJ 0 — QC AO 00 ON CA Боом = 


(1) (2) (3) (4) (5) (6) (7) (8) 
Socioeconomic City 
Parameters Age NE. c Sector 
p Xn Хә Хз Xia AIC, SBC, —21og, L(b) 
1 0 0 0 0 124.318 126.903 122.318 
2 1 0 0 0 118.913 124.083 114.913 
2 0 1 0 0 124.882 130.052 120.882 
2 0 0 1 0 122.229 127.399 118.229 
2 0 0 0 1 111.534 116.704 107.534 
3 1 1 0 0 119.109 126.864 113.109 
3 1 0 1 0 117.968 | 125.723 111.968 
3 1 0 0 1 108.259 116.014 102.259 
3 0 1 1 0 124.085 131.840 118.085 
3 0 1 0 1 112.881 120.636 106.881 
3 0 0 1 1 112.371 120.126 106.371 
4 1 1 1 0 119.502 129.842 111.502 
4 1 1 0 1 109,310 119.650 101.310 
4 1 0 1 1 109.521 119.861 101.521 
4 0 1 1 1 114.204 124.543 106.204 
5 1 1 1 1 111.054 123.979 101.054 


(b) Best Four Models for Each Criterion 
AIC, Criterion SBC, Criterion 


Rank Predictors AIC, Predictors SBC, 


X1, X4 108.259 Xi, Xa 116.014 
Xi, X2, X4 109.310 X4 116.704 
Хт, Хз, Ха 109.521 Xi, X2, X4 119.650 
Xi, X2, Хз, X4 111.054 Xi, Хз, X4 119.861 


Example 


procedures for model selection may not be feasible. In such cases, stepwise selection proces 
dures are generally employed. The stepwise procedures discussed in Section 9.4 for multipl? 
linear regression are easily adapted for use in logistic regression. The only change required 
concerns the decision rule for adding or deleting a predictor. For multiple linear regression 
this decision is based on &, the f-value associated with Бу, and its P-value. For logisti’ 
regression, we obtain an analogous procedure by basing the decision on the Wald statisti 
z* in (14.53b) for the kth estimated regression parameter. and its P-value. With this change 
implementation of the various stepwise variants, such as the forward stepwise, forwat 
selection, and backward elimination algorithms is straightforward. We illustrate the use 
forward stepwise selection for the disease outbreak data. 


Figure 14.1] provides partial output from the SPSS forward stepwise selection proced. ` 
for the disease outbreak example. This routine wil] add a predictor only if the P -val 
associated with its Wald test statistic is less than 0.05. In step one, city sector (Xa) 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 585 


FIGURE 14.10 Plots of АГС, SBC,, and —21og, L(b)—Disease Outbreak Example. 


(a) AIG, versus p (b) SBC, versus p 
125 Е ө ө 
130 
jac 125 
QQ 
" E 
< 115 
120 
110 


115 


120 


110 


—2 Log Likelihood 


100 


Logistic Regression 
Block 1: Method = Forward Stepwise (Wald) 


Buon Variables in the Equation 
senwise. B SE. | Wald df Sig. | Exp(B) 
боп 
"m idi Msn Step 1? SECTOR 1.743 | .473 | 13.593 1 .000 | 5.716 
ais pm Constant | —3.332 | .765 | 18.990 1 .000 .036 
` шогеак Step 2° AGE .029 | .013 4.946 1 .026 | 1.030 
i imule: SECTOR 1.673 | .487 | 11.791 1 .001 5.331 
n Е Constant | —4.009 | .873 | 21.060 1 .000 .018 
> а. Variabie(s) entered on step 1: SECTOR. 
"YR b. Variable(s) entered on step 2: AGE. $ 


entered; its P-value .000. In Step 2, age (X4) is entered, with a P-value of 0.026. At this 
| point the procedure terminates, because no further predictors can be added with resulting 
^ P-values less than 0.05. Thus, the forward stepwise selection. procedure has identified the 
4 same model favored by AIC, and SBC,. Notice that SPSS also prints the square of the Wald 

" test statistics z* from (14.53b) in the column labeled “Wald.” As noted earlier, when (z*)? 
T Ё is used, P-values are obtained from a chi-square distribution with 1 degree of freedom. 


o wei Qe c omn 


586 Part Three Nonlinear Regression 


14.7 Tests for Goodness of Fit 


The appropriateness of the fitted logistic regression model needs to be examined before it jg 
accepted for use, as is the case for all regression models. In particular, we need to examine 
whether the estimated response function for the data is monotonic and sigmoidal in shape, 
key properties of the logistic response function. Goodness of fit tests provide an overal] 
measure of the fit of the model, and are usually not sensitive when the fit is poor for justa 
few cases. Logistic regression diagnostics, which focus on individual cases, will be taken 
up in the next section. 

Before discussing several goodness of fit tests, itis necessary to again distinguish between 
replicated and unreplicated binary data. In Sections 3.7 and 6.8, we discussed the F test fo, 
lack-of-fit for the simple and multiple linear regression models. For simple linear regression, 
the lack-of-fit test requires repeat observations at one or more levels of the single predictor 
X, and, for multiple regression, there must be multiple or repeat observations that have the 
same values for all of the predictors. This requirement also holds true fortwo of the goodness 
of fit tests that we wi]] present for logistic regression, namely, the Pearson chi-square and 
the deviance goodness of fit tests. Then we present the Hosmer-Lemeshow test that is usefu] 
for unreplicated data sets or for data sets containing just a few replicated observations. 


Pearson Chi-Square Goodness of Fit Test 
The Pearson chi-square goodness of fit test assumes only that the Y;; observations are 
independent and that replicated data of reasonable sample size are available. The test can 
detect major departures from a logistic response function, but is not sensitive to small 
departures from a logistic response function. The alternatives of interest are: 


Ho: E{Y} = [1 + exp(—X’8)]"'! 
Hy: E{Y} # |1 exp -X'gr' 
As was the case with tests for lack-of-fit in simple and multiple linear regression, мё; 


shall denote the number of distinct combinations of the predictor variables by c, the ith 
binary response at predictor combination X; by Y;;, and the number of cases in the jth class 


(14.64) 


(j =1,...,c) will be denoted by пу. Recall from (14.32a) that: > 
У; = ү, (1465 
= 


The number of cases in the jth class with outcome | will be denoted О; and the num 
of cases in the jth class with outcome 0 will be denoted by Ojo. Because the respo , 
variable Y;; is a Bernoulli variable whose outcomes are ] and 0, the number of cases o 
and Ор are given as follows: 


aj 


05 = Y Y; - Y; (14.66 
i=l 

Ою = 0 0) = п, Vj = п; – Op (146 
j=l } 


for у = 1,...,с. 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 587 


If the logistic response function is appropriate, the expected value of Y;; is given by: 


E(Yi] = лу = [1 + exp Xj] (14.67) 
and is estimated by the fitted value 7;: 
f; = П + exp( -Ib)] (14.68) 


Consequently, if the logistic response function ís appropriate, the expected numbers of cases 
with Y;; = 1 and Y;; = 0 for the jth class are estimated to be: 


Ej = пуу (14.69а) 
Ej —nj(0 — fj) 2n; — Ең (14.69b) 


where E;, denotes the estimated expected number of 1s in the jth class, and Еу denotes 
the estimated expected number of Os in the jth class. 
The test statistic is the usual chi-square goodness of fit test statistic: 


2 E jx)? 


23 уа (14.70) 


Ј=1 k=0 


E 


Ed 


If the logistic response function is appropriate, X? follows approximately a x? distribution 
with c — p degrees of freedom when n; is large and р « c. As with other chi-square 
goodness of fit tests, it is advisable that most expected frequencies E ;, be moderately large, 
say 5 or greater, and none smaller than 1. 

Large values of the test statistic X? indicate that the logistic response function is not 
appropriate. The decision rule for testing the alternatives in (14.64), when controlling the 
level of significance at о, therefore is: 


If X? < x?(1— а; с — p), conclude Ho 


If X? > x?(1 — o5;c — p), conclude H, QD 
For the coupon effectiveness example, we have five classes. Table 14.7 provides for each 
class j: nj, the number of binary outcomes; 7}, the model-based estimate of л}; ру, the 
observed proportion of 1s; Ojo and Ол, the number of cases with Y;; = О and Y; = 1 
for each class; and finally, the estimated expected frequencies Ё у and Ej, if the logistic 
regression model (14.35) is appropriate (calculations not shown). 


Number of Coupons ' Number of Coupons 
Not Redeemed" Redeemed 
Observed Expected Observed Expected 
nj fj Pi Ojo É jo А Оһ En 

200 1736  .150 170 165.3 P. 30 34.7 
200 .2543 275 145 149.1 55 50.9 
200 .3562 .350 130 128.8 70 71.2 
200 .4731 .500 100 105.4 100 94.6 


.200 .7028  .685 63 59.4 137 140.6 


88 Part Three Nonlinear Regression 


Test statistic (14.76) is calculated as follows: 


xd (170— 165.3)? (30— 34.77 Е (137 — 140.6)" 
Е 165.3 34.7 140.6 


= 2.15 


For œ = 0.05 and c — p = 5 — 2 = 3, we require x^(.95:3) = 7.81. Since Х? = 2.15 <7.81 
we conclude Ho, that the logistic response function is appropriate. The P-value of the test 
is .54. 


Deviance Goodness of Fit Test 

The deviance goodness of fit test for logistic regression models is completely analogous 
to the F test for lack of fit for simple and multiple linear regression models. Like the 
F test for lack of fit and the Pearson chi-square goodness of fit test, we assume there 
are с unique combinations of the predictors denoted X;,..., Xe, the number of repeat 
binary observations at X; is п ;, and the ith binary response at predictor combination X; is 
denoted Y;;. 

The lack of fit test for standard regression was based on the general linear test of the 
reduced model E{Y;;} = X^ against the full model E(Y;;] = ше. In similar fashion, the 
deviance goodness of fit test is based on a likelihood ratio test of the reduced model: 


E{Y,;} = [V--exp(-X;)T! Reduced model (14.72) 
against the full model: 
EY) =л; Jede Full model (14.73) 


where л; are parameters, ј = 1,..., c. In the lack of fit test for standard regression, the 
full model allowed for a unique mean for each unique combination of the predictors, Ху. 
Similarly, the full model for the deviance goodness of fit test allows for a unique rabbi 
л; for each predictor combination. This full model in the logistic regression case is usually: 
referred to as the saturated model. 

To carry out the likelihood ratio test in (14.60), we must obtain the values of the maxi- 
mized likelihoods for the full and reduced models, namely L(F) and L(R). L(R) is obtained 
by fitting the reduced model, and the maximum likelihood estimates of the c parameters in 
the full model are given by the sample proportions in (14.32b): 


рр o ј=1,2.....с (14.74) 
nj 
Letting #, denote the reduced model estimate of z; at Ху, j = 1,.... с, it can be showit 


that likelihood ratio test statistic (14.60) is given by: 


G? = —2[log, L(R) — log, L(F)] 


Е " 1-2; 
mou кє. (2 JE үрк, (; =) 


= DEV (Xo, Xis... Xp) 


(1475 


EI 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 589 


The likelihood ratio test statistic in (14.75) is called the deviance, and we use 
DEV(Xo, Xi, ..., X д. to denote the deviance for a logistic regression model based оп 
predictors Xo, X1, .... Хр. The deviance measures the deviation, in terms of —2log, L, 
between the сше model and the fitted reduced logistic regression model based on 
Xp, Х\,...,Хь—. 

If the logistic response function is the correct response function and the sample sizes п; 
are large, then the deviance will follow approximately a chi-square distribution with c — p 
degrees of freedom. Large values of the deviance indicate that the fitted logistic model is 
not correct. Hence, to test the alternatives: 


Но: E{Y} = [1 + exp(—X’B) I" 
Hy: E{Y} # [1 + exp( - YE! 
the appropriate decision rule is: 
If DEV(Xo, Xi, ..., Xy.) € X?(1 — ос — p), conclude Ho 
If DEV(Xo, X1, ..., Xp-1) > X?(01 —05c — p), conclude Н, 


(14.76) 


(14.77) 


For the coupon effectiveness example, we use the results in Table 14.2 to calculate the 
deviance in (Їл. 75) directly: 


1736 .8264 
DEV(Xo, X1) = -a[otes, S 150 ) + (200 — 30) log, (29) 


Example — 


7028 2972 
Тє — 1371 жазгы 
+ + 13708, (К rm ) + (200 — 137) log, ( D ) 
= 2.16 


For a = .05 and c — p = 3, we require x?(.95; 3) = 7.81. Since DEV(Xo, X1) = 2.16 < 
7.81, we conclude Hp, that the logistic model is a satisfactory fit. The P-value of this test 
is approximately .54, the same as that obtained earlier for the Pearson chi-square goodness 
of fit test. 


Comment - 
If p; = О for some j in the first term in (14.75), then У; = 0 and: 


$ Y jtog, (#2) =0 
1 


Similarly, if p; = 1 for some j in the second term in (14.75), then Y ; = n; and: 
ч 1—7; : 
| б›-кәкє( 2) =o 
` 1—pj 


T erLemeshow Goodness of Fit Test ; 


Hosmer апа Lemeshow (Reference 14.4) proposed, for either unreplicated data sets or 
data sets with few replicates, the grouping of cases based on the values of the estimated 
probabilities. Suppose there are no replicates, i.e., n; — 1 for all j. The procedure consists 
of grouping the data into classes with similar fitted values #;, with approximately the same 


590 Part Three Nonlinear Regression 


TABLE 14.8 Hosmer-Lemeshow Goodness of Fit Test for Logistic Regression Function— Disease 


Outbreak Example. 
Number of Persons Number of Persons 
without Disease with Disease 
Class Observed Expected Observed Expected 

j т; Intervai п; Ojo E jo Ол Е л 
1 —2.60—under —2.08 20 19 18.196 1 1.804 
2 —2.08—under —1.43 20 17 17.093 3 2.907 
3 —1.43—under —.70 20 14 14.707 6 5.293 
4 —.70—under  .16 19 9 10.887 10 8.113 
5 .16—under 1.70 19 _8 6.297 11 12.703 
Total 98 67 67.180 31 30.820 


Example 


number of cases in each class. The grouping may be accomplished equivalently by using 
the fitted logit values 7/ = Xb since the logit values 7} are monotonically related to the 
fitted mean responses 7?;. We shall do the grouping according to the fitted logit values Ж. 
Use of from 5 to 10 classes is common, depending on the total number of cases. Once 
the groups are formed, then the Hosmer-Lemeshow goodness of fit statistic is calculated 
by using the Pearson chi-square test statistic (14.70) from the c x 2 table of observed 
and expected frequencies as described earlier. Hosmer and Lemeshow showed, using an 
extensive simulation study, that the test statistic (14.70) is well approximated by the chi- 
square distribution with c — 2 degrees of freedom. 


For the disease outbreak example, we shall use five classes. Table 14.8 shows the class 
intervals for the logit fitted values 7; and the number of cases пт; in each class. It also gives 
O jo and Оу, the number of cases with Y; = 0 and Y; = I for each class. Finally, Table 14.8 
contains the estimated expected frequencies E jo and E ;, based on logistic regression model 
(14.46) (calculations not shown). 

Test statistic (14.70) is calculated as follows: 


hs (19 — 18.196? (1 — 1.804)? (8 — 6.297)? (11 — 12.703)? 
Е 18.196 1.804 6.297 12.703 
= 1.98 


Since all of the n; аге approximately 20 and only two expected frequencies are less than 5 
and both are greater than І, the chi-square test is appropriate here. Foro; = .05andc—2 = 3, 
we require x°(.95; 3) = 7.81. Since X? = 1.98 < 7.81, we conclude Но, that the logistic 
response function is appropriate. The P-value of the test is .58. 


Comment 


We have noted that the Pearson chi-square and deviance goodness of fit tests are only appropriate when 
there are repeat observations and when the number of replicates at each X category is sufficiently 
large. Care must be taken in interpreting logistic regression output since some packages will provide 
these statistics and the associated P-values whether or not sufficient numbers of replicate observations 
are present. a 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 591 


14.8 Logistic Regression Diagnostics —  ć  ć 


In this section we take up the analysis of residuals and the identification of influential cases 
for logistic regression. We shall first introduce various residuals that have been defined for 
logistic regression and some associated plots. We then turn to the identification of influential 
observations. Throughout, we shall assume that the responses are binary; i.e., we focus on 
the ungrouped case. 


Logistic Regression Residuals 
Residual analysis for logistic regression is more difficult than for linear regression models 
because the responses Y; take on only the values 0 and 1. Consequently, the ith ordinary 
residual, e; will assume one of two values: 


1—7 ifY,—1 
e; = rt fY, =0 А (14.78) 
The ordinary residuals will not be normally distributed and, indeed, their distribution under 


the assumption that the fitted model is correct is unknown. Plots of ordinary residuals against 
fitted values or predictor variables will generally be uninformative. 


Pearson Residuals. The ordinary residuals can be made more comparable by dividing 
them by the estimated standard error of Y;, namely, ./#;(1 — fı). The resulting Pearson 
residuals are given by: 
Y; — fü 
Рр = ———— 14.79 

^ 7 VAGA р 
The Pearson residuals are directly related to Pearson chi-square goodness of fit statistic 
(14.70). То see this we first expand (14.70) as follows: 


„у y; On- Ew” E z Т = Cor Ew F Y E (14.792) 
jo 


j=l k=0 j=l j=l Л 


For binary outcome data, we set j = i, с = п, Ор = Yi, Ору = 1 — Yi, Ej = ffi, 
Ер = 1 — #;, and (14.792) becomnes: 


ply meom, 3 (Y; AY 


Yi — fy Gay 
-Y AV. GAY 


i=} i=} 


Y, — #1)? 7 
SS ME .— (14.79b) 
— Ri — Hi) 
AX Hence, we see that the sum of the squares of the Pearson residuals (14.79) is numerically 
б equal to the Pearson chi-square test statistic (14.792). Therefore the square of each Pearson 


residual measures the contribution of each binary response to the Pearson chi-square test 
statistic. Note that test statistic (14.790) does not follow an approximate chi-square distri- 
bution for binary data without replicates. 


592 Part Three Nonlinear Regression 


Studentized Pearson Residuals. The Pearson residuals do not have unit variance since 
no allowance has been made for the inherent variation in the fitted value #;. A better 
procedure is to divide the ordinary residuals by their estimated standard deviation. This 
value is approximated by Vf; (I —лӘ(1— hi), where A; is the ith diagonal element of 
the n x n estimated hat matrix for logistic regression: 


ed PE, 
H = WX(X'WX)'X'W5 (14. 80) 
Неге, W is the nxn diagonal matrix with elements 7i; (1 — 7i;), X is the usual n x p design 
matrix (6. 18b), and W? isa diagonal matrix with diagonal elements equal to the Square 
roots of those in W. The resulting studentized Pearson residuals are defined as: 
ғ £i 

SP; = —R—— 

^/ 1 == hj 
Recall that for multiple linear regression, the hat matrix satisfies the matrix expression 


Y — HY. The hat matrix for logistic regression is developedi in analogous fashion; it satisfies 
approximately the expression £^ = HY, where £t' is the (n x 1) vector of linear predictors. 


(14.81) 


Deviance Residuals. The model deviance (14.75) was obtained by carrying out the likeli- 
hood ratio test where the reduced model is the logistic regression model and the full model 
is the saturated model for grouped outcome data. For binary outcome data, we take the 
number of X categories to bec = n, n; = 1, ј =i, Y; = Y;, ру = Y {пу = Y;, and 
(14.75) becomes: 


e le, (у Z) +а- ig (FH) 


=—2 Y log,Gt;) + (1 — Y;)log, (1 — л;) — Y; log, (Y;) — (1 — Yi)log,(1 — Y] 


—25 IY; log, Gi) + (1 — ¥;) log, (1 — #)] (14.82) 
i=l 

since Y, log, (Y;) = (1 — Y;) log, (1 — Y;) = 0 for Y; = 0 or Y; = 1. Thus for binary data 

the model deviance in (14.75) is: 


]) 


DEV(Xp, <i Xp) = 922500 log. (#;) + (1 — Y) log (1— 4] — (014.823) 


The deviance residual for case i, denoted by dev;, is defined as the signed square root d 
the contribution of the ith case to the model deviance DEV in (14.822): 


dev; = sign(Y; — f) / —2Y; log, Gt;) + (1 — У) log, (1 — 4] (1483) 


where the sign is positive when Y; > 7; and negative when Y; < 7;. Thus the sum of thé 
squared deviance residuals equals the model deviance in (14.822): 


У (devi! = DEVQG, Ху,..., X a) 


i=l 


TABLE 14.9 
Logistic 
Regression 
Residuals and 
Hat Matrix 
piagonal 
Elements— 
Disease 
Outbreak 
Example. 


So ees 
Example 
Example 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 593 


а) (2) з (Q0 5) (6 (7) 
i Yi $t е Fp, Гур; dev, hy 
1 0 0.200 -0.209 0.514 0.524 0.6855 039 
2 0 0.219 0.219 ı -0529 -0.541 0.703 .040 
3 0 0.106 — —0.106 ^ —0.344  -0350 0.4735 .033 
96 0 0114 -0.114 0.358 —0.363 0.491 .025 
97 0 0.092  —0.092  —0.318  —0.3222 -0.439 .024 
98 0 0.171 —0.171 —0.455 0.463 


—0.613 .036 


Therefore the square of each deviance residual measures the contribution of each binary 
response to the deviance goodness of fit test statistic (14.822). Note that test statistic (14.822) 
does not follow an approximate chi-square distribution for binary data without replicates. 


Table 14.9 lists in columns 1—7, for a portion of the disease outbreak example, the re- 
sponse Y;, the predicted mean response 7;, the ordinary residual e;, the Pearson residual 
rp, the studehtized Pearson residual rsp,, the deviance residual dev;, and the hat matrix 
diagonal elements /;;. We illustrate the calculations needed to obtain these residuals for the 
first case. The ordinary residual for the first case is from (14.78): 


€; = Y; — fij = 0 — .209 = —.209 
The first Pearson residual (14.79) is: 
ey —.209 
Рр = —======= = SS = 54 
Vt%1—7) 7.20901 — .209) 


Substitution of rp, and the leverage value Л from column 7 of Table 14.9 into (14.81) 
yields the studentized Pearson residual: 


Р. ME rp, = —.514 
sı icu Cu 039 


Finally, the first deviance residual is obtained from (14.83): 


= —.524 


dev, = sign(Y, — $3) / —2[Ү, log, (f) + (1 — У,)1од„(1 — £)] 
= sign(—.209)/—2[0 log, (.209) + (1 — 0) log,(1 — .209)] 


= —4/—21og,(.791) = —.685 , 


The various residuals are plotted against the predicted mean response in Figure 14.12, 
although we emphasize that such plots are not particularly informative. Consider, for exam- 
ple, the ordinary residuals in Figure 14.12a. Here we see two trends of decreasing residuals 
with slope equal to —1. These two linear trends result from'the fact, noted above, that the 
residuals take on just one of two values at a point X;, 1 — #; or 0 — #;. Plotting these values 
against ft; will always result in two linear trends with slope — 1. The remaining plots lead 
to similar patterns. 


594 Part Three Nonlinear Regression 


FIGURE 14.12 Selected Residuals Plotted against Predicted Mean Response—Disease Outbreak Example, 
(а) e; versus 7; (b) гь Versus 7) 


1.0 


о 
со 
Ооо, 


Pearson Residual 


Ordinary Residual 


0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
Estimated Probability Estimated Probability 
(C) rg, versus 4i; (d) dev; versus dij 


Deviance Residual 


Studentized Pearson Residual 


0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
Estimated Probability Estimated Probability 


Diagnostic Residual Plots 

In this section we consider two useful residual plots that provide some information about 
the adequacy of the logistic regression fit. Recall that in ordinary regression, residual plots 
are useful for diagnosing model inadequacy, nonconstant variance, and the presence of 
response outliers. In logistic regression, we generally focus only on the detection of model 

* inadequacy. As we discussed in Section 14.1, nonconstant variance is always present in the 
logistic regression setting, and the form that it takes is known. Moreover, response outliers 
in binary logistic regression are difficult to diagnose and may only be evident if all responses 
in a particular region of the X space have the same response value except one or two. Thus 
we focus here on model adequacy. 


Residuals versus Predicted Probabilities with Lowess Smooth. If the logistic regres- 
sion model is correct, then E {Y;} = л; and it follows asymptotically that: 
E(Y, — fi) = Efe} =0 


This suggests that if the model is correct, a lowess smooth of the plot of the residu- 
als against the estimated probability #; (or against the linear predictor #/) should result 
approximately in a horizontal line with zero intercept. Any significant departure from this 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 595 


FIGURE 14.13 Residual Plots with Lowess Smooth—Disease Outbreak Example. 


(a) Tsp, Versus т; (b) [gy МЄГЅИЅ 7} 


| ч 


O; 
W o 
Ф ooo 


0 T — 


Studentized Pearson Residual 


Studentized Pearson Residual 


m © 
1 Dag В 
-2 
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 -3 -2 -1 0 1 2 
Estimated Probability Linear Predictor 
(с) dev; versus т; (d) dev; versus 4i; 


о 
Voooy 009, 
O 4 


E: Оо E 
ё 1 оф оф 5 
а A 
о 0 o 
E = g 
E on © 
ы 

O офа " a 

9o 


0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
Estimated Probability Linear Predictor 


suggests that the model may be inadequate. In practice, the lowess smooth of the ordinary 
residuals, the Pearson residuals, or the studentized Pearson residuals can be employed. 
(Further details regarding the plotting of logistic regression residuals can be found in 
Reference 14.5.) 


‘Example _ Shown in Figures 14.13a-d are residual plots for the disease outbreak example, each with 
770 the suggested lowess smooth superimposed. (We used the MINITAB lowess option with 
degree of smoothing equal to .7 and number of steps equal to 0 to produce these plots.) In 
Figures 14.13a and 14.13b, the studentized Pearson residuals are plotted respectively against 
the estimated probability and the linear predictor. Figures 14.13c and 14.13d provide similar 
plots for the deviance residuals. In all cases, the lowess smooth approximates a line having 
zero slope and intercept, and we conclude that no significant model inadequacy is apparent. 


Half-Normal Probability Plot with Simulated Envelope. A half-normal probability plot 
of the deviance residuals with a simulated envelope is useful both for examining the adequacy 
of the linear part of the logistic regression model and for identifying deviance residuals that 
are outlying. А half-normal probability plot helps to highlight outlying deviance residuals 
even though the residuals are not normally distributed. In a normal probability plot, the kth 


596 Part Three Nonlinear Regression 


Example 


ordered residual is plotted against the percentile z| (k — .375)/(1 + .25)] or against /MSE 
times this percentile, as shown in (3.6). In a half-norma] probability plot, the kth ordered 
absolute residual is plotted against: 


MESES 
(LT (14.84) 


Outliers will appear at the top right of a half-normal probability plot as points separated 
from the others. However, a halt-normal plot of the absolute residuals will not necessari]y 
give a straight line even when the fitted model is in fact correct. 

To identify outlying deviance residuals, we combine a half-normal probability plot witha 
simulated envelope (Reference 14.6). This envelope constitutes a band such that the plotted 
residuals are all likely to fall within the band if the fitted model is correct. 

A simulated envelope for a half-normal probability plot of the absolute deviance residuals 
is constructed in the following way: 


1. For each of the п cases, generate a Bernoulli outcome (О, 1), where the Bernoulli 
parameter for case i is ĝ;, the estimated probability of response Y; = b according to the 
originally fitted model. 

2. Fit the logistic regression model for the 7 new responses where the predictor variables 
keep their original values, and obtain the deviance residuals. Order the absolute deviance 
residuals in ascending order. 

3. Repeat the first two steps 18 times. 

4. Assemble the smallest absolute deviance residuals from the 19 groups and determine 
the minimum value, the mean, and the maximum value of these 19 residuals. 

5. Repeat step 4 by assembling the group of second smallest absolute residuals, the group 
of third smallest absolute residuals, etc. 

6. Plot the minimum, mean, and maximum values for each of the л ordered residual 
groups against the corresponding expected value in (14.84) on the half-normal probability 
plot for the original data and connect the points by straight lines. 


By using 19 simulations, there is one chance in 20, or 5 percent, that the largest absolute 
deviance residual from the original data set lies outside the simulated envelope when the 
fitted model is correct. Large deviations of points from the means of the simulated values 
or the occurrence of points near to or outside the simulated envelope. are indications that 
the fitted model is not appropriate. 


Table 14.10a repeats a portion of the data for the disease outbreak example, as well as the 
fitted values for the logistic regression model. It also contains a portion of the simulated 
responses for the 19 simulation samples. For instance, the simulated responses for case 1 
were obtained by generating Bemoulli random outcomes with probability ж, = .209. 
Table 14.106 shows some of the ordered absolute deviance residuals for the 19 simulation 
samples. Finally, Table 14.10с presents the minimum, mean, and maximum for the 19 sim- 
ulation samples for some of the rank order positions, the ordered absolute deviance fot 
the original sample for these rank order positions, and corresponding z percentiles. The. 
results in Table 14.10c are plotted in Figure 14.14. We see clearly from this беше that 
the largest deviance residuals (which here correspond to cases 5 and 14) are farthest to the 
right and are somewhat separated from the other cases. However, they fall well within the 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 597 


TABLE 14.10 (a) Simulated Bernoulli: Outcomes l 


Results for 
Simulated Simulation Sample 
ne i Y, ft (1) эз (19) 
Ha =I XO] d gee! 
Probability | 0 pus е va D 
plot —Disease | 0 :219 d . 
e 97 0 .092: 0 er 0 
а 98 0 171 1 s 0 


(b) Ordered Absolute Deviance Residuals 
for Simulation Samples 


Order a = | 
Position | Simulation Sample _ 
k (1) ee (19) 5 
1 .468 e .368 
2 "eb ee .368 
/ 97 1.849 e 2:085 
98 1.919 eut 2.228 


(c) Minimum, Mean, and Maximum of Ordered Absolute Deviance 
Residuals for Simulation Samples 


Order Simulation Samples f 

Position iz Loo MTM PE e Original k +97.875 
k Minimum Mean Maximum Data z| ———- 

; 196.5 

1 .046 .289 .491. .386 .008 

2 .060 .296: .491 .386 .021 

97 1.804 2.273 3.194 2. 082 2.397 

98 1.869 . 2.387 3.391 2.098 2.729 


gue 1444 35 


Dev. Residual 


Expected Value 


598 Part Three Nonlinear Regression 


simulated envelope so that remedial measures do not appear to be required. Figure 14, 10 
also shows that most of the absolute deviance residuals fall near the simulation means, 
suggesting that the logistic regression model is appropriate here. 


Detection of Influential Observations 
In this section we introduce three measures that can be used to identify influential ob. 
servations. We consider the influence of individual binary cases on three aspects of the 
analysis: 


1. The Pearson chi-square statistic (14.79b). 
2. The deviance statistic (14.82a). 
3. The fitted linear predictor, 7}. 


As was the case in standard regression situations, we will employ case-deletion diagnostics 
to assess the effect of individual cases on the results of the analysis. 


Influence on Pearson Chi-Square and the Deviance Statistics. Let X? and DEV denote 
the Pearson and deviance statistics (14.79b) and (14.82a) based on the full data set, and let 
Хх? and DEV,;, denote the values of these test statistics when case i is deleted. The ith 
delta chi-square statistic is defined as the change in the Pearson statistic when the ith case 
is deleted: 

АХ? =X- xX? 


a) 


Similarly, the ith delta deviance statistic is defined as the change in the deviance statistic 
when the ith case is deleted: 


Adev, = DEV — DEV i) 


Determination of the л delta chi-square statistics or the п delta deviance statistics requires: 
n maximizations of the likelihood, which can be time consuming. For faster computing, the 
following one-step approximations have been developed: 


AX? = г, (14.85) 
Adev; = h iil sp, + dev; (1486) 


In summary, AX; and Adev; give the change in the Pearson chi-square and deviance, 
statistics, respectively, when the ith case is deleted. They therefore provide measures of the 
influence of the ith case on these summary statistics. 

Interpretation of the delta chi-square and delta deviance statistics is not always a simp 
matter. In standard regression situations, we employ various rules of thumb for judging th 
magnitude of a regression diagnostic. An example of this is the Bonferroni outlier test (Se: 
tion 10.2) that is used in conjunction with the studentized deleted residual (10.26). Anoth 
is the use of various percentiles of the F distribution for interpretation of Cook's distan, 
(Section 10.4). Guidelines such as these are generally not available for logistic regressio. 
as the distribution of the delta statistics is unknown except under certain restrictive assu 
tions. The judgment as to whether or not a case is outlying or overly influential is typic- 
made on the basis of a subjective visual assessment of an appropriate graphic. Usually, 
delta chi-square and delta deviance statistics are plotted against case number i, against 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 599 
TABLE 14.11 Pearson Residuals, Studentized Pearson Residuals, Hat Diagonals, Deviance Residuals, Delta 
Chi-Square and Delta Deviance Statistics, and Cook's Distance—Disease Outbreak Example. 
p^ (1) (2) (3) (4) 6) 6) 


1 ч 


or against 7;. Extreme values appear as spikes when plotted against case-number, or as 
outliers in the upper corners of the plot when plotted against 7; or 7. 


Table 14.11 lists in columns 1—6 for a portion of the disease outbreak data the Pearson 
residuals rp, thé studentized Pearson residuals rsp,, the hat matrix diagonal elements h;;, 
the deviance residuals, dev;, the delta chi-square statistics AX?, and the delta deviance 
residuals Adev;. We illustrate the calculations needed to obtain AX 2. and Adey,, for the first 
case. As noted in (14.85) the first delta chi-square statistic is given by the square of the first 
studentized Pearson residual: 


АХ? = rêp, = (—.524)? = 215 


Using (14.86) with № = .039 and dev, = —.685 from columns 3 and 4 of Table 14.11, 
the first delta deviance statistic 1s: 


Adev, = һу, + devi = .039(—.524)? + (—.685)? = .479 


Figures 14.15а and 14.15b provide index plots of the delta chi-square and delta deviance 
statistics for the disease outbreak example. The two spikes corresponding to cases 5 and 14 
indicate clearly that these cases have the largest values of the delta deviance and delta chi- 
square statistics. Shown just below each of these in Figures 14.15c and 14.15d are plots of 
the delta chi-square and delta deviance statistics against the model-estimated probabilities. 
Note that cases 5 and 14 again stand out—this time in the upper left corner of the plot. The 
results suggest that cases 5 and 14 may substantively affect the conclusions. The cases were 
therefore flagged for potential remedial action at a later stage of the analysis. 


Influence on the Fitted Linear Predictor: Cook's Distance. In Chapter 10, we intro- 

duced Cook's distance statistic, D;, for the identification of influential observations. We 

noted that for the standard regression case D; measures the standardized change in the 

: fitted response vector Y when the ith case is deleted. Similarly, Cook's distance for logistic 
а regression measures the standardized change in the linear predictor #; when the ith case 
is deleted. Like the delta statistics described above, obtaining these values exactly requires 

n maximizations of the likelihood. Instead, the following one-step approximation is used 


600 Part Three Nonlinear Regression 


FIGURE 14.15 Delta Chi-Square and Delta Deviance Plots—Disease Outbreak Example. 


(а) АХ? versus i (b) Adev, versus i 


Delta Chi-Square 
Delta Deviance 


OQ-— HK шш М л Сс м OO 


0 10 20 30 40 50 60 70 80 90100 
Case Case 


(с) AX 2 versus 1 (d) Adev; versus 77; 


Delta Deviance 


Delta Chi-Square 
© — мю ш М UO C \© 


0 
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
Estimated Probability Estimated Probability 


à (Reference 14.5): 


„2 
i pli 


Index plots of leverage values h; are useful for identifying outliers in the X space, and 
index plots of D; can be used to identify cases that have a large effect on the fitted linear 
predictor. As was the case with the delta chi-square and delta deviance statistics, rules of 
thumb for judging the magnitudes of these diagnostics are not available, and we must rely 
on a visual assessment of an appropriate graphic. Note that influence on both the deviance 
(or Pearson chi-square) statistic and the linear predictor can be assessed simultaneously 
using a proportional influence or bubble plot of the delta deviance (or delta chi-square) 
statistics, in which the area of the plot symbol is proportional to D;. 


Cook's distances are listed in column 7 of Table 14.11 for a portion of the disease outbreak 
example. To illustrate the calculation of Cook's distance we again focus on the first case 
We require у = .039, rp, = —.514 from columns 1 and 3 of Table 14.11. Then, we have 


Example 


FIGUR 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 601 


E 14.16 Index Plots of Leverage Values, Cook's Distances, and Proportional-Influence Plot of Delta 


Deviance Statistic—Disease Outbreak Example. 


Leverage 


0.12 


0.02 


a) hy versus i (b) D; versus i 
di i 


Case 48 —> 0.08 Case 48 — 


0.07 
0.06 
0.05 
0.04 
0.03 
0.02 
0.01 
0.00 


Cook's Distance 


10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 


Case Index Case Index 


(с) Proportional-Influence Plot 


Squared Delta Deviance 
Residual 


0.1 0.3 0.5 0.7 0.9 
Estimated Probability 


from (14.87) with p — 5: 


_ Tahu (—.514)2(.039) 
| pl —hy)? 5(1 — .039)2 


Figures 14.16a-c display an index plot of hj, an index plot of D;, and a proportional- 
influence plot of the delta deviance statistics. The leverage plot identifies case 48 as being 
somewhat outlying in the X space—and therefore potentially influential—and the plot of 
Cook's distances indicates that case 48 is indeed the most influential in terms of effect on 
the linear predictor. Note that cases 5 and 14— previously identified as most influential in 
terms of their effect on the Pearson chi-square and deviance statistics—have relatively less 
influence on the linear predictor, This is shown also by the proportional-influence plot in 
Figure 14.16c. These two cases, which have the largest delta deviance values, are located 
in the upper left region of the plot. The plot symbols for these cases are not overly large, 
indicating that these cases are not particularly influential in terms of the fitted linear predictor 
values. Case 48 was temporarily deleted and the logistic regression fit was obtained (not 
shown). The results were not appreciably different from those obtained from the full data 
set, and the case was retained. Ы 


= .0022 


D, 


14.9 Inferences about Mean Response 


Frequently, estimation of the probability л for one or several different sets of values Of the 
predictor variables is required. In the disease outbreak example, for instance, there May be 
interest in the probability of 10-year-old persons of lower socioeconomic status living in 
city sector | having contracted the disease. 


Point Estimator 
As usual, we denote the vector of the levels of the X variables for which zz is to be estimated 


by X: 
| 
Xm 
X, = | Хе (14.88) 
Хр 
and the mean response of interest by лк: 
ль = | I + exp -X,B)r' (14.89) 
The point estimator of z, will be denoted by 7, and is as follows: 
ft, = [I exp -X;b)I' (14.90) 


where b is the vector of estimated regression coefficients in (14.43). 


Interval Estimation 


We obtain a confidence interval for лт, in two stages. First, we calculate confidence limits 
for the logit mean response лу. Then we use the relation (14.382) to obtain confidence limits 
for the mean response 7. To see this clearly, we consider (14.382) for X = X: 


EY.) = [I + exp -X;8)1 | 
and restate the expression by using the fact that E(Y;] = ль and X, p = лу: 
ль = |1 + exp(—725)] ! (14.91) 
It is this relation in (14.91) that we utilize to convert confidence limits for лу, into confidence 
limits Гоголь. 
The point estimator of the logit mean response лу = X; is 7; = X; b. The estimated * 
approximate variance of 7, = X; b according to (5.46) is: 
s^ Ut, ) = s*(X; b] = X;s'(b]X, (14.92) 
where s*{b} is the estimated approximate variance-covariance matrix of the regression 
coefficients in (14.51) when 7 is large. 
Approximate | — о large-sample confidence limits for the logit mean response л, al 
then obtained in the usual fashion: 
L =, — (1 — o/2)slft) (14.932) 
U =f, dt z(1 —o/2)50f,] (14.93b) 


Here, L and U are, respectively, the lower and upper confidence limits for лү. 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 603 


Finally, we use the monotonic relation between л and zt; in (14.91) to convert the 
confidence limits L and U for лу into approximate 1 — o confidence limits L* and U* for 
the mean response т: 


L* = [1 + exp( - D]? (14.942) 
U* = [1 + exp( -U)] ! (14.94b) 


simultaneous Confidence Intervals for Several Mean Responses 


When it is desired to estimate several mean responses зһ corresponding to different Жу 
vectors with family confidence coefficient 1 — œ, Bonferroni simultaneous confidence in- 
tervals may be used. The procedure for g confidence intervals is the same as that for a single 
confidence interval except that z(1 — 0/2) in (14.93) is replaced by z(1 — 20/22). 


In the disease outbreak example of Table 14.3, it is desired to find an approximate 95 percent 
confidence interval for the probability 7t, that persons 10 years old who are of lower socio- 
economic status and live in sector 1 have contracted the disease. The vector X, in (14.88) 
here is: 


Using the results in Table 14.4a, we obtain the point estimate of the logit mean response: 
f = X,b = —2.3129(1) + .02975(10) + .4088(0) — .30525(1) + 1.5747(0) 
= —2.32065 
The estimated variance of fi; is obtained by using (14.92) (calculations not shown): 
s?(ft,] = .2945 


so that s(£,) = .54268. For 1 – о = .95, we require z(.975) = 1.960. Hence, the confi- 
dence limits for the logit mean response 7, are according to (14.93): 


L = —2.32065 — 1.960(.54268) = —3.38430 
U = —2.32065 + 1.960(.54268) = —1.25700 
Finally, we use (14.94) to obtain the confidence limits for the mean response лһ: 
L* = [1 + exp(3.38430)] ! = .033 
U* = [1 + exp(1.25700)] ! = .22 
Thus, the approximate 95 percent confidence interval for the mean response 7p iS: 
.033 x m, < .22 


We therefore find, with approximate 95 percent confidence, that the probability is between 
.033 and .22 that 10-year-old persons of lower socioeconomic status who live in sector 1 have 
contracted the disease. This confidence interval is useful for indicating that persons with 
the specified characteristics are not subject to a very high probability of having contracted 
the disease, but the confidence interval is quite wide and thus not precise. 


604 Part Three Nonlinear Regressian 


Comment 


The confidence limits For лр in (14.94) are not symmetric around the point estimate. In the беде 
outbreak example, for instance. the point estimate is: А 


Б rane 


ft, = 11 + ехр(2.32065)[! = .089 


while the confidence limits are .033 and .22. The rcason for the asymmetry is that ££, is not a linear 
function of Лү. н 


14.10 Prediction of а New Observation 


Multiple logistic regression is frequently employed for making predictions for new obserya- 
tions. In one application, for example, health personnel wished to predict whether a certain 
surgical procedure will ameliorate a new patient's condition, given the patient's age, gen- 
der, and various symptoms. In another application, marketing officials of a computer firm 
wished to predict whether а retail chain will purchase a new computer, On the basis of the £ 
age of the company's current computer, the company's current workload, and otherfactors — ; 


~ see 


Choice of Prediction Rule з 
Forecasting а binary outcome for given levels X; of the X variables is simple in the sense | 
that the outcome 1 will be predicted if the estimated value 7i, is large, and the outcome 0 
will be predicted if 7), is small. The difficulty in making predictions of a binary outcome is 
in determining the cutoff point, below which the outcome 0 is predicted and above which 
the outcome 1 is predicted. A variety of approaches are possible to determine where this 
cutoff point is to be located. We consider three approaches. 


1. Use .5 as the cutoff. With this approach, the prediction rule is: 
If Ê, exceeds .5, predict 1; otherwise predict 0. 


This approach is reasonable when (a) it is equally likely in the population of interest that 
outcomes 0 and I will occur; and (b) the costs of incorrectly predicting 0 and 1 are approx- 
imately the same. 

2. Find the best cutoff for the data set on which the multiple logistic regression model з 
is based. This approach involves evaluating different cutoffs. For each cutoff, the rule is 
employed on the п cases in the model-building data set and the proportion of cases incorrectly 
predicted is ascertained. The cutoff for which the proportion of incorrect predictions islowest 
is the one to be employed. : 

This approach is reasonable when (a) the data set is a random sample from the relevant 
population, and thus reflects the proper proportions of Os and 15 in the population, and  : 
(b) the costs of incorrectly predicting 0 and | are approximately the same. The proportion 
of incorrect predictions observed for the optimal cutoff is likely to be an overstatement 
of the ability of the cutoff to correctly predict new observations, especially if the model- 
building data set is not large. The reason is that the cutoff is chosen with reference to the К 
same data set from which the logistic model was fitted and thus is best for these data only. 7 
Consequently, as we explained in Chapter 9, it is important that a validation data set be 
employed to indicate whether the observed predictive ability for a fitted regression model 
is а valid indicator for predicting new observations. 


k 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 605 


3. Use prior probabilities and costs of incorrect predictions in determining the cutoff. 
When prior information is available about the likelihood of 1s and Os in the population 
and the data set is not a random sample from the population, the prior information can 
be used in finding an optimal cutoff. In addition, when the cost of incorrectly predicting 
outcome 1 differs substantially from the cost of incorrectly predicting outcome 0, these costs 
of incorrect consequences can be incorporated into the determination of the cutoff so that 
the expected cost of incorrect predictions will be minimized. Specialized references, such 
as Reference 14.7, discuss the use of prior information and costs of incorrect predictions 
for determining the optimal cutoff. 


We shall use the disease outbreak example of Table 14.3 to illustrate how to obtain the cutoff 
point for predicting a new observation, even though the main pürpose of that study was to 
determine whether age, socioeconomic status, and city sector are important risk factors. 
We assume that the cost of incorrectly predicting that a person has contracted the disease 
is about the same as the cost of incorrectly predicting that a person has not contracted the 
disease. The estimated logistic response function is given in (14.46). 

Since a random sample of individuals was selected in the two city sectors, the 98 cases in 
the study constitute a cross section of the relevant population. Consequently, information is 
provided in the sample about the proportion of persons who have contracted the disease in 
the population. Of the 98 persons in the study, 31 had contracted the disease (see the disease 
outbreak data set in Appendix C.10); hence the estimated proportion of persons who had 
contracted the disease is 31/98 — .316. This proportion can be used as the starting point in 
the search for the best cutoff in the prediction rule. 

Thus, the first rule investigated was: 


mer 
Example 
Example — 


Predict 1 if £j > .316; predict 0 if fj, < .316 (14.95) 


Note from Table 14.3, column 6, that ££, = .209 for case 1; hence prediction rule (14.95) 
calls for a prediction that the person has not contracted the disease. This would be a correct 
prediction. Similarly, prediction rule (14.95) would correctly predict cases 2 and 3 not to 
have contracted the disease. However, the prediction with rule (14.95) for case 4 (person has 
contracted the disease because ĝ4 = .371 > .316) would be incorrect. Similarly, the predic- 
tion for case 5 (person has not contracted the disease because 75 = .111 <.316) would be 
incorrect. Table 14.12a provides a summary of the number of correct and incorrect classi- 
fications based on prediction rule (14.95). Of the 67 persons without the disease, 20 would 
i be incorrectly predicted to have contracted the disease, or an error rate of 29.9 percent. 


TABLE 14.12 Classification Based on Logistic Response Function (14.46) and Prediction Rules 
(14.95) and (14.96)—Disease Outbreak Example. 
= 


(a) Rule (14.95) .. (b) Кше (14.96) 
ў=0 ў=1 Total ў=0  f-1 Total 
47 20 67 50 17 67 
8 23 31 9 22 31 
55 43 98 Us9 ` 39 98 


606 Part Three Nonlinear Regression 


FIGURE 14.17 
JMP ROC 
Curve— 
Disease 
Outbreak 
Example. 


Of the 31 persons with the disease, eight would be incorrectly predicted with rule (1495) 
not to have contracted the disease, or 25.8 percent. Altogether, 20 + 8 = 28 of the 98 predic. 
tions would be incorrect, so that the prediction error rate for rule (14.95) is 28/98 = 286 Ge 
28.6 percent. 

Similar analyses were made for other cutoff points and it appears that among the cutoffs 
considered, use of the following rule may be best: 


Predict 1 if #;, > .325; predict 0 if ft, < .325 (14.96) 


Table 14.12b provides a summary of the correct and incorrect classifications based on 
prediction rule (14.96). The prediction error rate for this rule is (9 + 17)/98 = 265 
26.5 percent. Note also that for this rule, the error rates for persons with and without the 
disease (9/31 and 17/67) are quite close to each other. Thus, the risks of incorrect predictions 
for the two groups are fairly balanced, which is often desirable. Note also that the error 
rates for persons with and without the disease are much less balanced as the cutoff is shifted 
further away from the optimal one in either direction. 

An effective way to display this information graphically is through the receiver oper- 
ating characteristic (ROC) curve, which plots P(Y = ЦУ = 1) (also called sensitivity) as 
a function of 1 — P(Y —0|Y — 0) (also called 1— specificity) for the possible cutpoints Лу. 
Figure 14.17 exhibits the ROC curve for model (14.46) for all possible cutpoints between 
0 and 1. (See A.7a for the definition of conditional probability.) 

To see how a single point on the ROC curve in Figure 14.17 is determined, we consider 
rule (14.95), for which the cutoff is .316. From Table 14.122, the sensitivity is: 


Receiver Operating Characteristic Curve 


1-Specificity = .30 
Sensitivity = .74 


True Positive, Sensitivity 


0.0 
.00 .10 .20 .30 .40 .50 .60 .70 .80 .90 1.00 
1-Specificity, False Positive 


Using Y = '1’ to be the positive tevel 
Area Under Curve — 0.77684 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 607 
Also, 1 —specificity here is: 


2 47 
1— P(Y =0|Y = 0) = 1 67 = .30 
This point is highlighted oxi the ROC curve in Figure 14.17. 

The area under the ROC curve is a useful summary measure of the model's predictive 
power and is identical to the concordance index. Consider any pair of observations (i, j) 
such that Y; = 1 and Y; =0. Since Y; > Y}, this pair is said to be concordant if й; > 7}. 
The concordance index estimates the probability that the predictions and the outcomes are 
concordant (Reference 14.2). A value of 0.5 means that the predictions were no better than 
random guessing. For the disease outbreak model (14.96), the ROC area is 0.777. 

A validation study will now be required to determine whether the observed prediction 
error rate for the optimal cutoff properly indicates the risks of incorrect predictions for new 
observations, or whether it seriously understates them. In any case, it appears already that 
fitted logistic regression model (14.96) may not be too useful as a predictive model because 
of the relatively high risks of making incorrect predictions. 


Comment 

A limitation of the prediction rule approach is that it dichotomizes a continuous predictor £? where 
the choice of cutpoint f; is arbitrary and is highly dependent upon the relative frequencies of 1s and 
Os observed in the sample. E 


ation of Prediction Error Rate 


The reliability of the prediction error rate observed in the model-building data set is exam- 
ined by applying the chosen prediction rule to a validation data set. If the new prediction 
error rate is about the same as that for the model-building data set, then the latter gives a 
reliable indication of the predictive ability of thë fitted logistic regression model and the 
chosen prediction rule. If the new data lead to a considerably higher prediction error rate, 
then the fitted logistic regression model and the chosen prediction rule do not predict new 
observations as well as originally indicated. 


In the disease outbreak example, the fitted logistic regression function (14.46) based on the 
model-building data set: 


ft = [1 + exp(—3.8877 — .02975X, — .4088X; + .30525X3 — 1.5747X4)] ' 


was used to calculate estimated probabilities 7 for cases 99-196 in the disease outbreak data 
set in Appendix C.10. These cases constitute the validation data set. The chosen prediction 
rule (14.96): 


Predict 1 if £i, > .325; predict 0 if ft, < .325 


608 Part Three Nonlinear Regression 


was then applied to these estimated probabilities. The percent prediction error rates were 
as follows: 


Disease Status 


With Without 
Disease Disease Total 


46.2 38.9 40.8 


Note that the total prediction error rate of 40.8 percent is considerably higher than the 
26.5 percent error rate based on the model-building data set. The latter therefore is пога 
reliable indicator of the predictive capability of the fitted logistic regression model and the 
chosen prediction rule. 

We should mention again that making predictions was not the primary objective in the 
disease outbreak study. Rather, the main purpose was to identify key explanatory variables, 
Still, the prediction error rate for the validation data set shows that there must be other key 
explanatory variables affecting whether a person has contracted the disease that have not 
yet been identified for inclusion in the logistic regression model. 


Comment 

An alternative to multiple logistic regression for predicting a binary response variable when the 
predictor variables are continuous is discriminant analysis. This approach assumes that the predictor 
variables follow a joint multivariate normal distribution, Discriminant analysis can also be used when 
this condition is not met, but the approach is not optimal then and logistic regression frequently 
is preferable. The reader is referred to Reference 14.8 for an in-depth discussion of discriminant 
analysis. a 


14.11 Polytomous Logistic Regression for Nominal Response 


Logistic regression is most frequently used to model the relationship between a dichotomous 
response variable and a set of predictor variables. On occasion, however. the response 
variable may have more than two levels. Logistic regression can still be employed by 
means of a polytomous—or multicategory—logistic regression model. Polytomous logistic 
regression models are used in many fields. In business, for instance, a market researcher 
may wish to relate a consumer's choice of product (product A, product B, product C) to 
the consumer's age, gender, geographic location. and several other potential explanatory 
variables. This is an example of nominal polytomous regression, because the response 
categories are purely qualitative and not ordered in any way. Ordinal response categories сал 
also be modeled using polytomous regression. For example, the relation between severity 
of disease measured on an ordinal scale (mild, moderate, severe) and age of patient, gender 
of patient, and some other explanatory variables may be of interest. We consider ordinal 
polytomous logistic regression in detail in Section 14.12. . 

In this section we discuss the use of polytomous logistic regression for nominal multi- 
category responses. Throughout, we will use the pregnancy duration example, introduc 
in Section 14.2 in the context of binary logistic regression. to illustrate concepts. This time, 
however, the response will have more than two categories. 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 609 


pregnancy Duration Data with Polytomous Response 
A study was undertaken to determine the strength of association between several risk factors 
and the duration of pregnancies. The risk factors considered were mother's age, nutritional 
status, history of tobacco use, and history of alcohol use. The response of interest, pregnancy 
duration, is a three-category variable that was coded as follows: 


Y; Pregnancy Duration Category 

1 Preterm (less than 36 weeks) 

2 Intermediate term (36 to 37 weeks) 
3 Full term (38 weeks or greater) 


Relevant data for 102 women who had recently given birth at a large metropolitan hospital 
were obtained. À portion of these data is displayed in Table 14.13. The polytomous response, 
pregancy duration (Y), is shown in column 1. Nutritional status (Х|), shown in column 5, is 
an index of nutritional status (higher score denotes better nutritional status). The predictor 
variable age was categorized into three groups: less than 20 years of age (coded 1), from 21 
to 30 years of age (coded 2), and greater than 30 years of age (coded 3). It is represented by 
two indicator variables (X2 and X3), shown in columns 6 and 7 of Table 14.13, as follows: 


Class X2 Хз 


Less than or equal to 20 years of age 1 0 
21 to 30 years of age 0 0 
Greater than 30 years of age 0 1 


(The researchers chose the middle category—21 to 30 years of age—as the referent category 
for this qualitative predictor because mothers in this age group tend to have the lowest risk 
of preterm deliveries. This leads to positive regression coefficients for these predictors, and 
a slightly simpler interpretation.) Alcohol and smoking history were also qualitative pre- 
dictors; the categories were “Yes” (coded 1) and “No” (coded 0). Alcohol use history (X4), 
and smoking history (X5) are listed in columns 8 and 9 of Table 14.13. 


4 
TAB LE 14.13  Data—Pregnancy Duration Example with Polytomous Response. 
(t) 20 (3 (4) (5) (6) (7) (8) (9) 
Nutritional Alcohol Use Smoking 
a R Catego Age-Categoi ] 
ee Duration Lise aude id Status D History History 

Y; Yn Yz Ys . Xn Хр Хз Xia Xis 

1 1 0 0 150 0 0. 0: 1 

£ 1 1 0 0 124 1 0 0 0 

Е 1 1 0 0 128 ‚0 0. 0 1 

ч 3 о 0 117 0 о, 1 1 

3 0 0 165 0 0 1 1 

i 3 0 0 134 Ü 0 1 1 


Because pregnancy duration is a qualitative variable with three categories, we wit creat 
s 8 Ж 5 е 
three binary response variables, one for each response category as follows: 


I if case ѓ response is category 1 
Yi = eM 
Q otherwise 


" | if case г response is category 2 
— | 
д 0 otherwise 


|. if case / response is category З 
Yi = В 
0 otherwise 


These three coded variables are also included in Table 14.13 in columns 2, 3, and 4, Note 
that because Уд + Yj2 + Уз = 1, the value of any one of these three binary variables can 
be determined from the other two. For example, Уз = I — Уд — Y;;. 

We first treat pregnancy duration as a nominal response, ignoring the time-based Ordering 
of the categories: later we will show how a more parsimonious model results when we treat 
pregnancy duration as an ordinal response. 


J —1 Baseline-Category Logits for Nominal Response 


In general, we will assume there are J response categories. Then for the /th observation, 
there will be J binary response variables, У;1. ..., Yj, where: 


y | if case / response is category j 
; 0 otherwise 
Since only one category can be selected for response /, we have: 


J 


У =! 
j=! 


We will require some additional notation for the multicategory case. First, let л; denote 
the probability that category j is selected for the /th response. Then: 


ли = P(Y; = 1) 


In the binary case, J = 2. Suppose that we code Y; = | if the /th response is category 1, 
and we code Y; = O if the /th response is category 2. Then: 


T; =N; and |—лр=лр 
For binary logistic regression. we model the logit of л; using the linear predictor. Since 


there are only two categories in binary logistic regression, the logit in fact compares the 
probability of a category-1 response to the probability of a category-2 response: 


, л; Л " " 
7; = log. = log, = Tip = X 
ї—л; лэ 


Note that we have used л and fj; to emphasize that the linear predictor is modeling the 
logarithm of the ratio of the probabilities for categories 1 and 2. 

Now for the J polytomous categories, there are J(J — 1)/2 pairs of categories, and 
therefore J(J — 1)/2 linear predictors. For example, for the pregnancy duration data, 


ELA 


ше о ge “егт c YI 


n 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 611 
J — 3 and we have 3(3 — 1)/2 — 3 comparisons: 


лі 
NAE Xii; 
Ti 
i cime [эж 


лз == log, EE X. Bos 


Fortunately, it is not necessary to develop all J(J — 1)/2 logistic regression models. One 
category will be chosen as the baseline or referent category, and then all other categories will 
be compared to it. The choice of baseline or referent category is arbitrary. Frequently the 
last category is chosen and, indeed, this is usually the default choice for statistical software 
programs. One exception to this may be found in epidemiological studies, where the category 
having the lowest risk is often used as the referent category. 

Using category J to denote the baseline category, we need consider only the J — 1 
comparisons to this referent category. The logit for the jth such comparison is: 


tiy = 10g, |74] = XB j=1,2...,J—1 (14.97a) 


À 


Since it is understood that comparisons are always made to category J, we let л], = лу, 


and B; = В, in (14.972), giving: 
луу = log, |Z mj- Xp, j=1,2,...,J—1 (14.97b) 


The reason that we need to consider only these J — 1 logits is that the logits for any 
other comparisons can be obtained from them. To see this, suppose J = 4, and we wish to 
compare categories 1 and 2. Then: 


Jt Tit Tig 
log, | — | = log, | — x — 
Л ла Ло 


= X;Bı F. X; Bo 


In general, to compare categories k and /, we have: 


log, |= z|- X/(B, — B) (14.98) 


Given the J — 1 logit expressions in (14.98) it is possible (algebra not shown) to obtain 
the J — 1 direct expressions for the category probabilities in terms of the J — 1 linear 
predictors, X’B ;. The resulting expressions are: 

exp(X;B;) 
1+ Yu ехр(х;В,) 
We next consider methods for obtaining estimates of the J — 1 parameter vectors B,, 


B. Brzi 


Jg = J=1,2,...,J—-1 (14.99) 


612 PartThree Nonlinear Regression 


Maximum Likelihood Estimation 


There are two approaches commonly used for obtaining estimates of the parameter vectors 
TREE 8,1; bothemploy maximum likelihood estimation. With the first approach, Separate 
binary logistic regressions are carried out for each of the J — | comparisons to the baseline 
category. For example, to estimate В, we drop from the data set all cases except those for 
which either Уд = 1 or Y;; = I. Since only two categories are then present, we can apply 
binary logistic regression directly. This approach is particularly useful when statistica] 
software is not available for multicategory logistic regression (Reference 14.9), 
A more effective approach from a statistical viewpoint is to obtain estimates of the 
J — | logits simultaneously. To do so, we require the likelihood for the full data set. To fix 
ideas, suppose that there are J = 4 categories and that the third category is selected for the 
ith response. That is, for case i we have: 


Y; = 0 Yi = 0 Үз = 1 Ya = 0 
The probability of this response is: 


P(Y; = 3) = ліз 


= [ra]? x [xil x pos]! x bua? 
4 

= Пл 
j=! 


For n independent observations and J categories, it is easily seen that the likelihood is: 


" п Ј 
PY,- Y) = [Ро = Ш Ц (14.100) 
i=l i=! Lj=! 
It can be shown that the log likelihood is given by: 
п J—1 J-1 
iog lP On ..., Yl = У | 9 (У;Х,В,) – log, | 1 + Y ехрх;В,) (14.101) 
i=) \ j=l j=l 
The maximum likelihood estimates of В,..... 6,_, are those values. b)..... b,_,, that 


maximize (14.101). As usual, we will rely on standard statistical software programs to 
obtain these estimates. 

As was the case for binary logistic regression, the J — 1 fitted response functions may be 
obtained by substituting the maximum likelihood estimates of the J — 1 parameter vectors 
into the expression in (14.99): 


x exp(X;b;) 


Tij — J-1 (14.102) 
| Yu ехр(Х; Ы) 


We turn now to an example to illustrate the analysis and interpretation of a nominal -level 
polytomous logistic regression model. 


Example 


FIGURE 14.18 
MINITAB 
Nominal 
Logistic 
Regression 
Outpnt— 
Pregnancy 
Duration 
Example. 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 613 
For the pregnancy duration data in Table 14.13, a set of J —1 = 2 first-order linear predictors 
was initially proposed: 

Tij А 

tog, [72] —XB, forj=1,2 

Лиз 
MINITAB’s nominal logistic regression output is displayed in Figure 14.18. It first indicates 
that the response had three levels, 1, 2, and 3, and that the referent response event is 
Y; — 3. Following this summary is the logistic regression table, which contains the estimated 
regression coefficients, estimated approximate standard errors, the Wald test statistics and 


P-values, the estimated odds ratios for the two estimated linear predictors, and the 95 percent 
confidence intervals for the odds ratios. The maximum likelihood estimates of B, and В, аге: 


3.958 5.475 
—0.0464 —0.0654 

b = | 29135] p=] 29570 
! | 1.8875 2 | 2.0597 
1.0670 2.0429 

2.2305 2.4524 


Before using the fitted model to make inferences, various regression diagnostics similar to 
those already discussed for binary logistic regression should be examined. In polytomous 
logistic regression, the multiple outcome categories make this a more difficult problem 


Polytomous Nominal MTB Output 
Response Information 


Variable Value Count 

preterm 3 41 (Reference Event) 
2 35 
i 26 
Total 102 


Logistic Regression Table 


Odds 95% CI 

Predictor Coef SE Coef Z P Ratio Lower Upper 
Logit 1: (2/3) 

Constant 3.958 1.941 2.04 0.041 

nutritio -0.04645 0.01489 -3.12 0.002 0.95 0.93 0.98 
agecati 2.9135 0.8575 3.40 0.001 18.42 3.43 98.91 
agecat3 1.8875 0.8088 2.83 0.020 6.60 1.35 32.23 
alcohol 1.0670 0.6495 1.64 0.100 2.91 0.81 10.38 
smoking 2.2305 0.6682 3.34 0.001 9.30 2.51 34.47 
Logit 2: (1/3) 

Constant 5.475 2.272 2.41 0.016 

nutritio -0.06542 0.01824 -8.59 0.000 0.94 0.90 0.97 
agecati 2.9570 0.9645 3.07 0.002 19.24 2.91 127.41 
agecat3 2.0597 0.8947 2.30 0.021 7.84 1.36 45.30 
а1соһо1 2.0429 0.7097 2.88 0.004 7.71 1.92 31.00 
smoking 2.4524 0.7315 3.35 0.001 11.62 2.77 48.72 


Log-likelihood = -84.338 
Test that all slopes are zero: б = 52.011, DF = 10, P-Value = 0.000 


614 PartThree Nonlinear Regression 


олем 


than was the case for binary logistic regression. We thus recommend assessing the fi 
and monitoring logistic regression diagnostics using the J — 1 individual binary logistic 
regressions, as described in the first paragraph on page 612. Hence, we would assess the fi 
of the two logistic regression models separately, and then make a statement about the fit of 
the polytomous logistic model descriptively. Diagnostics, including the Hosmer-Lemeshoy, 
test for goodness of fit. simulated envelopes for deviance residuals, and plots of influence 
statistics were examined for the pregnancy duration data, and no serious departures were © 
found (results not shown). We turn now to model interpretation and inference. 

As indicated in Figure 14.18, all Wald test P-values are less than .05— with the exception 
of alcohol in the first linear predictor—indicating that all of the predictors should be retained, 
In all cases, the direction of the association between the predictors and the estimated logits, : 
as indicated by the signs of the estimated regression coefficients, were as expected, i 

For teenagers, the esumated odds of delivering preterm compared to full term are : 
18.42 times the estimated odds for women 20—30 years of age; the 95% confidence in- d 
terval for this odds ratio has a lower limit of 3.43 and an upper limit of 98.91. Thus while ` 
the age effect is estimated to be very large, there is considerable uncertainty in the estimate, 
Similarly, the estimated odds for teenagers of delivering intermediate term compared to 
full term are 19.24; the lower 95% confidence limit is 2.91 and the upper limit is 127.41. * 
History of smoking, history of alcohol use, and being in the 30-and-over age category also = 
increase the estimated odds of delivering preterm or intermediate term compared to full. : 
term, though less dramatically. The negative estimated coefficients for nutritional status in- 
dicate that a lower nutritional status is associated with increased odds of delivering preterm 
or intermediate term compared to full term. 


КОЛГЕ 


Vu 


Comment E 


То derive expression (14.101) for the log likelihood, we first obtain the logarithm of (14.100) and 
let z;;—1— a my and Y; = 1— "pem Y;;. It follows that: 


1—1 
Y; | log, |l — 1 Tij 
j=! 
п 4-1 


4-1 4—1 1-1 
=} У уор +18, |1- Ул - У Ygl- ml] 
i j=l j=l j=l | 


J-l 


n 


J-i 
log, PA... Y) = у | X вті | 1- 
j=l 


i-l j=l 


i-I j-l 
п J-I J-I 
= Y; log, | | + tog, |1 У л; 
= ij OL, og, ij 
| - Tig ; 
i-I j-l j=l 


Substitution of the expressions in (14.97b) for log, [11/71] and in (14.99) Гог; in the second term ; 
leads to the desired log likelihood in (14.101). a 


14.12 Polytomous Logistic Regression for Ordinal Response 


Up to this point, we have considered polytomous logistic regression models for unordered 
categories. Categories, however, are frequently ordered. Consider the following response 
variables: 


1. A food product 15 rated by consumers on a 1—10 hedonic scale. 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 615 


2. In an economic study, persons are classified as either not employed, employed part time, 
or employed full time. 

3. The quality of sheet metal produced is rated on a 1—5 scale, depending on the clarity and 
reflectivity of the surface. 

4. Employees are asked to rate working conditions using a 7-point scale (unacceptable, 
poor, fair, acceptable, good, excellent, outstanding). 

5. The severity of cancer is rated by stages on a 1—4 basis. 


Such responses can be analyzed by using the techniques for nominal logistic regression 
described in Section 14.11, but a more effective strategy, yielding a more parsimonious and 
more easily interpreted model, results if the ordering of the categories is taken into account 
explicitly. The model that is usually employed is called the proportional odds model. 

To motivate this model, we revisit the pregnancy duration example. We will assume that 
pregnancy duration is a continuous response denoted by Үг. For ease of exposition, we will 
also assume that there is just one (quantitiative) predictor, nutrition index, X;,. Assume that 
Үс can be represented by the simple linear regression model: 


Y? = BS + ВХ + ker 
where =; follows the standard logistic distribution (14.14) with mean zero and standard 
deviation 77 /4/3, and k is a constant that satisfies: 
л 
Мз 
Researchers were interested in specific categories of pregnancy delivery time and therefore 


discretized pregnancy duration Ү? using the following upperbounds or cutpoints for each 
category: 


сҮ} = ko(e] =k 


Y, Category Yr Cutpoint Т 
1 Preterm 0 < Ү < 36 weeks T, = 36 weeks 
2 intermediate term ` 36 weeks < Yf < 38 weeks T2 = 38 weeks 
3 Full term 38 weeks < Yf < оо Тз = оо 


The proportional odds model for ordinal logistic regression models the cumulative proba- 
bilities P(Y; < j) rather than the specific category probabilities P(Y; = j) as was the case 
for nominal logistic regression. We now develop the required expressions for the cumulative 
probabilities. 

For j = 1 we have: 


Р(Ү € 1) = P(Yf <T) (14.1032) 
= Р(бу + BrX; + ke, < Ту) (14.103b) 
= P(ke, < Т, — By — BF Xi) (14.103c) 

Т – В; Вт 
= Se A 1 
P(e ее. ж (14.103d) 


= P(ey € оу + B, Xi) (14.103e) 


616 PartThree Nonlinear Regression 


where o = (Ti — В) / Капа @ = — By /k. Since e, follows a standard logistic distribution 
the cumulative probability in (14.103e) is obtained by using the cumulative distribution 
function (14.14b): 


exp(a + i X;) 
P(Y; = 1) = ла = 
( вар) (14.1032 
For у = 2, following the development іп (14.103), we have: 
P(Y; <2) = Р(Ү < Th) (14.1043) 
= P(B + Br Xj; + Кє, € Т.) (14.104b) 
= P(ke, € Т — fj — BY Xj) (14.104) 
= 17›— В By 
= 16 < HI. PES (14.1044) 
= Р(& € œ + В Х,) (14.1046) 


exp(ao + Pi X;) 


= IF expa + bı X) (14.1040 


Notice that the only difference between (14.103f) and (14.104f) involves the intercept 
terms a, and œz. The slopes Б are the same in both expressions. For the multiple regression 
case involving J ordered categories, we let: 


Xn fi 
X; = | Xn в= | 
Xip- By 


Equations (14.103f) and (14. 104f) become for category j: 


ехр(а; + X;B) 
| + exp(a; + Х;В) 


P(Y; < j) = for j=1,2,...,J—1 (14.105) 


Model (14.105) is often referred to as the proportional odds model. Taking the logit trans- 


formation of both sides yields the J — 1 cumulative logits: 


[васе 
“|1—Р@, <j) 


| =; xe foj—1l..,J—1 (14106 


The difference between the ordinal logits in (14.106) and the nominal logits in (14.97b) 
should now be clear. In the nominal case, each of the J — 1 parameter vectors В n is unique. 
For ordinal responses, the slope coefficient vectors В are identical for each of the J – 1 
cumulative logits, but the intercepts differ. 

As in the binary logistic regression case, each slope parameter can again be interpreted 
as the change in the logarithm of an odds ratio—this time the cumulative odds ratio—for à 
unit change in its associated predictor. In general, (14.106) satisfies, for j = 1....,7/ — I: 


s Е sk. P; zh 


—(X.— Xy 14.107) 
Р(Ү; > к) ` ЕР) ED ( 


Ёхатр!е 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 617 


We now briefly discuss estimation methods before returning to the pregnancy duration 
example. 


Maximum Likelihood Estimation. As was the case for nominal logistic regression, 
separate binary logistic regressions can be used to obtain estimates of the J — 1 linear 
predictors in (14.106). For j — 1,..., J — 1, weconstruct the binary outcome variable: 


yo _ [1 1s) 
f 0 ifY,; >j 


and carry out a logistic regression analysis based on yg ) Note that this approach leads to 
J — 1 separate estimates of the slope parameter vector f. 

A better approach, if the required software is available, is to estimate 0, ..., оу. and 
B simultaneously using maximum likelihood estimation. From (14.100), the likelihood is 
given by: 


Р(Ү,,...,Ү,) = Tr ERU") 


i21 Njel 


n J 
= (Пеон «7)= Р(ї,<]— Т) (14.108) 


i=) \]=1 


Substitution of P(Y; <J)=1, Р(Ү; x0)—0, and the expression for P(Y; < Ј), 
jL...,J—1,in (14.105) yields the required expression for the likelihood in terms of 
04, ..., 0-4, and В. The maximum likelihood estimates are those values of o, ..., 0/5. 
and В, namely, a,,..., аз and b that maximize (14.108). As always, we shall rely on 
standard statistical software to carry out the maximization. We now return to the pregnancy 
duration example. 


We continue the analysis of the pregnancy duration data, this time under the assumption 
that the response is ordinal, rather than nominal. Recall that Y; = 1 indicates preterm de- 
livery, Y; — 2 indicates intermediate-term delivery, and Y; — 3 indicates full-term delivery. 
MINITAB ordinal logistic regression output is shown in Figure 14.19. As required with 
J =3, the program provides estimates for two intercepts, a, = 2.930 and a; = 5.025, and 
р — 1-5 slope coefficients, b, = —.04887, b; = 1.9760, Рз = 1.3635, b4 = 1.5915, and 
bs = 1.6699. The Wald P-values indicate that all of the regression coefficients are statisti- 
cally significant at the .05 level. 

As noted above, the coefficients can be interpreted as the change in the cumulative odds 
ratio for a unit change in the predictor. For example, the results indicate that the logarithm of 
the odds of a pre- or intermediate-term delivery (Y; < 2) for smokers (X5 = 1) is estimated 
to be by = 1.5915 times the logarithm of the odds for nonsmokers (X5 = 0). The estimated 
cumulative odds ratio is given by exp(1.519) = 4.91 and a 95% confidence interval for 
the true cumulative odds ratio has a lower limit of 2.02 and an upper limit of 11.92. The 
remaining slope parameters can be interpreted in a similar fashion. 

Notice again that the interpretation of the ordinal logistic regression model is much 
simpler than that for the nominal logistic regression model, because only a single slope 
vector p is estimated. 


618 Part Three Nonlinear Regression 


FIGURE 14.19 
MINITAB 
Ordinal 
Logistic 
Regression 
Output— 
Pregnancy 
Duration 
Example. 


Link Function: Logit 


Response Information 


Variable 


preterm 


Logistic Regression Table 


Value 


Total 


Odds 95% CI 

Predictor Coef SE Coef 7 Р Ratio Lover Upper 
Const(1) 2.930 1.465 2.00 0.045 

Const(2) 5.025 1.521 3.30 0.001 

nutritio -0.04887 0.01168 4.18 0.000 0.95 0.93 0.97 
agecati 1.9760 0.5875 3.36 0.001 7.21 2.28 22.89 
agecat3 1.3635 0.5547 2.46 0.014 3.91 1.32 11.60 
smoking 1.5915 0.4525 3.52 0.000 4.91 2.02 11.92 
alcohol 1.6699 0.4727 3.53 0.000 5.31 2.10 13.42 


Log-likelihood = -86.756 
Test that all slopes are zero: б = 47.174, DF = 5, P-Value = 0.000 


Comment 


Our development of the proportional odds model assumed that the ordinal response Y; was obtained 
from an explicit discretization of an observed continuous response Y‘, but this is not required. This 
model often works well for ordinal responses that do not arise from such a discretization. a 


a 


14.13 Poisson Regression 


We consider now another nonlinear regression model where the response outcomes aie 
discrete. Poisson regression is useful when the outcome is a count, with large-count out- 
comes being rare events. For instance, the number of times a household shops at a particular 
supermarket in a week is a count, with a large number of shopping trips to the store during 
the week being a rare event. À researcher may wish to study the relation between a family's 
number of shopping trips to the store during a particular week and the family's income, 
number of children, distance from the store, and some other explanatory variables. As ап- 
other example, the relation between the number of hospitalizations of a member of a health 
maintenance organization during the past year and the member's age, income, and previous 
health status may be of interest. 


Poisson Distribution 


The Poisson distribution can be utilized for outcomes that are counts (Y; = 0, L, 2. 2 
with a large count or frequency being a rare event. The Poisson probability distribution 18 


тейит ч 


nh vert e gn, 


p 


Qt 


woe ми эе) 


тезүле, = 


ер у" 


mre Ie pe за 


sper: a" 


Hip 


El Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 619 


as follows: 


Y me 
fo = AR Y =0,1,2,... (14.109) 
where f (Y) denotes the probability that the outcome is Y and Y! 2 Y(Y — 1)---3-2. 1. 
The mean and variance of the Poisson probability distribution are: 


E(Y)-u (14.1102) 
сҮ} = и (14.110b) 


Note that the variance is the same as the mean. Hence, if the number of store trips follows 
the Poisson distribution and the mean number of store trips for a family with three children 
is larger than the mean number of trips for a family with no children, the variances of the 
distributions of outcomes for the two families will also differ. 


Comment - 


At times, the count responses Y will pertain to different units of time or space. For instance, in a 
survey intended to obtain the total number of store trips during a particular month, some of the counts 
pertained only to the last week of the month. In such cases, let и denote the mean response for Y 
for a unit of time or space (e.g., one month), and let t denote the number of units of time or space to 
which Y corresponds. For instance, t = 7/30 if Y is the number of store trips during one week where 
the unit time is one month; 7 = 1 if Y is the number of store trips during the month. The Poisson 
probability distribution is then expressed as follows: 


(tu)” ехр(—[н) 


ҒО) = y Ү= 0, 1, 2,... (14.111) 
Our discussion throughout this section assumes that all responses Y; pertain to the same unit of time 
or space. n 


Poisson Regression Model 
The Poisson regression model, like any nonlinear regression model, can be stated as follows: 


Y; = E(Yi] + =; 1-12... 


The mean response for the ith case, to be denoted now by ш; for simplicity, is assumed 
as always to be a function of the set of predictor variables, X,,..., Xy... We use the 
notation u (Xz, В) to denote the function that relates the mean response ju; to X;, the values 
of the predictor variables for case i, and В, the values of the regression coefficients. Some 
commonly used functions for Poisson regression are: 


ш = Ш(Х;, 8) = Х.В (14.112а) 
ш = p B) = exp) (14.112b) 
ш = AX В) = log OG B) (14.112c) 


In all three cases, the mean responses иу must be nonnegative. 
Since the distribution of the error terms e; for Poisson regression is a function of the 
distribution of the response Y;, which is Poisson, it is easiest to state the Poisson regression 


620 Part Three Nonlinear Regression 


model in the following form: 


Y; are independent Poisson random variables with expected 
values 14, where: 


(14.113) 
ш = K(X; p) 


The most commonly used response function is 4; = exp(X'p). 


Maximum Likelihood Estimation 
For Poisson regression model (14.113), the likelihood function is as follows: 


n 


П [&(Х;, 10^ expli (X, D] 
Y! 


L() = [ [500 = 
i=l 


i=l 


i Us B1 p exp[— 577.4 HX, В) 
- Шы 5 1 E | (14114 


Once the functional form of u(X;, f) is chosen, the maximization of (14.114) produces 
the maximum likelihood estimates of the regression coefficients В. As before, it is easier to 
work with the logarithm of the likelihood function: 


log, L(B) = Y Y; log. (X; B] У СХ, B) — У log (', (14.115) 
i=] i=l i=] 


Numerical] search procedures are used to find the maximum likelihood estimates bo, bi, ..., 
b,-1. Wteraüvely reweighted least squares can again be used to obtain these estimates. We 
shall rely on standard statistical software packages specifically designed to handle Poisson 
regression to obtain the maximum likelihood estimates. 
After the maximum likelihood estimates have been found, we can obtain the fitted 
response function and the fitted values: 
ft = AX, b) (14.1162) 


f; = (X;, b) (14.116b) 
For the three functions in (14.112), the fitted response functions and fitted values gre: 
и = Х'В: й = ХЪ Ё; = Xjb (14.116) 
u = exp(X’B): й = exp(X’b) Ё; = exp(X;b) (14.1160) 
u = log, (X): Ё = log, (X'b) Ё; = 1ор„(Х®) (14.116е) 


Model Development 


Model development for a Poisson regression model is carried out in a similar fashion 
to that for logistic regression, conducting tests for individual coefficients or groups of 
coefficients based on the likelihood ratio test statistic G? in (14.60). For Poisson regression 


Inferences 


Example 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 621 


model (14.113), the model deviance is as follows: 


DEV(Xo, Xy... Xp) = —2| Уор, (| =) + V (5; д) (14.117) 
i=} Yi i=] 


where i; is the fitted value for the ith case according to (14.116b). The deviance residual 
for the ith case is: 


^. 2 
dev; = + E log, ($) Y= 0) (14.118) 
The sign of the deviance residual is selected according to whether Y; — Ё; is positive ог neg- 
ative. Index plots of the deviance residuals and half-normal probability plots with simulated 
envelopes are useful for identifying outliers and checking the model fit. 


Comment 
If Y; = 0, the term [Y; log, (fu / Y;)] in (14.117) and (14.118) equals 0. i 
Inferences for a Poisson regression model are carried out in the same way as for logistic 
regression. For instance, there is often interest in estimating the mean response for predictor 
variables X;,. This estimate is obtained by substituting X, into (14.116). 
In Poisson regression analysis, there is sometimes also interest in estimating probabilities 
of certain outcomes for given levels of the predictor variables, for instance, P(Y = 0| X;). 
Such an estimated probability can be obtained readily by substituting fi, into (14.109). 
Interval estimation of individual regression coefficients can be carried out by use of the 
large-sample estimated standard deviations furnished by regression programs with Poisson 
regression capabilities. 


The Miller Lumber Company is a large retailer of lumber and paint, as well as of plumbing, 
electrical, and other household supplies. During a representative two-week period, in-store 
surveys were conducted and addresses of customers were obtained. The addresses were 
then used to identify the metropolitan area census tracts in which the customers reside. At 
the end of the survey period, the total number of customers who visited the store from each 
census tract within a 10-mile radius was determined and relevant demographic information 
for each tract (average income, number of housing units, etc.) was obtained. Several other 
variables expected to be related to customer counts were constructed from maps, including 
distance from census tract to nearest competitor and distance to store. 

Initial screening of the potential predictor variables was conducted which led to the 
retention of five predictor variables: 


Ху: Number of housing units 

X»2: Average income, in dollars 

Хз: Average housing unit age, in years 

X4: Distance to nearest competitor, in miles 

Xs: Distance to store, in miles 

Y;: Number of customers who visited store from census tract 


622 Part Three 


TABLE 14.14 
Data—Miller 
Lumber 
Company 
Example. 


TABLE 14.15 
Fitted Poisson 
Response 
Function and 
Related 
Resülts— 
Miller Lumber 
Company 
Example. 


Nonlinear Regression 


Census Housing Average Average Competitor Store Number of- 


Tract Units Income Age Distance Distance Customers ` 
i Xi X2 X3 Ха Х5 y i 
1 606 41,393 3 3.04 6.32 9 
2 641 23,635 18 1.95 8.89 6 
3 505 55,475 27 6.54 2.05 28 
108 817 54,429 47 1.90 9.90 Р 
109 268 34,022 54 1.20 9.51 4 
110 519 52,850. 43 2.92 8.62 6 


(a) Fitted Poisson Response Function 


f. = exp[2.942 + .000606 X, — .0000117 X2 — .00373X; + .168X4 — .129 Xs] 
DEV(Xo, X1, Xz, Хз, Xa, Xs) = 114.985 


(b) Estimated Coefficients, Standard Deviations, and G? Test Statistics 


Estimated Estimated 

Regression Regression Standard 

Coefficient Coefficient Deviation G? P-value 
Bo 2.9424 .207 
f .0006058 -00014 18.21 .000 
Вә —.00001169 .0000021 31.80 .000 
Ёз —.003726 0018 4.38 ‚036 
Ba 1684 .026 41.66 .000 
Bs —.1288 .016 67.50 .000 


Data for a portion of the n — 110 census tracts are shown in Table 14.14. 
Poisson regression model (14.113) with response function: 


IX. B) = exp(X’B) 


was fitted to the data, using LISP-STAT (Reference 14.10). Some principal results are 
presented in Table 14.15. Note that the deviance for this model is 114.985. 

Likelihood ratio test statistics (14.60) were calculated for each of the individual regres- 
sion coefficients. These G? test statistics are shown in Table 14.15b, together with their 
associated P-values, each based on the chi-square distribution with one degree of freedom. 
We note from the P-values that each predictor variable makes a marginal contribution to 
the fit of the regression model and consequently should be retained in the model. 

A portion of the deviance residuals dev; is shown in Table 14.16, together with the 
responses Y; and the fitted values 2. Analysis of the deviance residuals did not disclose 
any major problems. Figure 14.20 contains an index plot of the deviance residuals. We 
note a few large negative deviance residuals; these are for census tracts where Y =0; 16» 


FIGURE 14.20 
Index Plot of 
Deviance 
Résiduals— 
Miller Lumber 
‘Company 
Example. 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 623 


i Y; fu dev, 
1 | 9 ^23. =999 
2 6 88 —992 
3 28 281 —.024 

108 6 5.3 289 
109 4 44 —197 
110 6 64 —171 


2 
S 1 
5 
& oH J M 
g 4 
S-1P a 
> 
á 
=2 
-3 


20 40 60 80 100 
Index 


there were no customers from these areas. These may be difficult cases to fit with a Poisson 
regression model. 


14.14 Generalized Linear Models 


We conclude this chapter and the regression portion of this book by noting that all of the 
regression models considered, linear and nonlinear, belong to a family of models called 
generalized linear models. This family was first introduced by Nelder and Wedderburn 
(Reference 14.11) and encompasses normal error linear regression models and the nonlinear 
exponential, logistic, and Poisson regression models, as well as many other models, such 
as log-linear models for categorical data. 

The class of generalized linear models can be described as follows: 


1. Y,,..., Y, aren independentresponses that follow a probability distribution belonging 
to the exponential family of probability distributions, with expected value E{Y;} = ш. 


2. A linear predictor based on the predictor variables X;,, ..., X; , ., is utilized, denoted 
by Xj B: 


XiB = Bo + By Ха t Bp-1Xi,p-1 
3. The link function g relates the linear predictor to the mean response: 


XiB = gu) 


624 PartThree Nonlinear Regression 


Generalized linear models may have nonconstant variances o? for the responses Y, but 
the variance 0? must be a function of the predictor variables through the mean response ш 

To illustrate the concept of the link function, consider first logistic regression model 
(14.41). There, the logit transformation Fg ! (t;) in (14. 18a) serves to link the linear predictor 
Х;В to the mean response р; = лг: 


T ГА 
gu) = &(т) = log, - = X 
| — л; 
As à second example, consider Poisson regression model (14.113). There we consid- 
ered several response functions in (14.112). For the response function p; = ехр(Х;В) in 
(14.1126), the linking relation is: 


gi) = log, (u;) = Х;В 


We see from the Poisson regression models that there may be many different possible link 
functions that can be employed. They need only be monotonic and differentiable. 

Finally, we consider the normal error regression model in (6.7). There the link function 
is simply: 


g^) = ш 
since the linking relation is: 
Xip = ш 


The link function g(4;) for the normal error case is called the identity or unity link function. 

Any regression model that belongs to the family of generalized linear models can be an- 
alyzed in a unified fashion. The maximum likelihood estimates of the regression parameters 
can be obtained by iteratively reweighted least squares [by ordinary least squares for normal 
error linear regression models (6.7)]. Tests for model development to determine whether 
some predictor variables may be dropped from the model can be conducted using likelihood 
ratio tests. Reference 14.12 provides further details about generalized linear models and 
their analysis. 


Cited 
References 


14.1. Kennedy, W. J., Jr., and J. E. Gentle. Statistical Computing. New York: Marcel Dekker, 1980. 

14.2. Agresti, A. Categorical Data Analysis. 2nd ed. New York: John Wiley & Sons, 2002. 

14.3. LogXact 5. Cytel Software Corporation. Cambridge, Massachusetts, 2003. А 

14.4. Hosmer, D. W., and S. Lemeshow. Applied Logistic Regression. 2nd ed. New York: John 
Wiley & Sons, 2000. 

14.5. Cook, R. D., and S. Weisberg. Applied Regression Including Computing and Graphics. New 
York: John Wiley & Sons, 1999. 

14.6. Atkinson, A. C. "Two Graphical Displays for Outlying and Influential Observations in 
Regression,” Biometrika 68 (1981), pp. 13—20. 

14.7. Johnson, К. A., and D. W. Wichern. Applied Multivariate Statistical Analysis. 5th ed. 
Englewood Cliffs, N.J.: Prentice Hall, 2001. 

14.8. Lachenbruch, P. A. Discriminant Analysis. New York: Hafner Press. 1975. 

14.9. Begg, C. B., and R. Gray. "Calculation of Polytomous Logistic Regression Parameters Using 
Individualized Regressions,” Biometrika 71 (1984), pp. 11-18. 


14.10. 


14.11. 


14.12. 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 625 


Tierney, L. LISP-STAT: An Object-Oriented Environment for Statistical Computing and 
Dynamic Graphics. New York: John Wiley & Sons, 1990. 

Nelder, J. A., and К. W. M. Wedderburn. “Generalized Linear Models,” Journal of the Royal 
Statistical Society A 135 (1972), pp. 370-84. 

McCullagh, P., and J. A. Nelder. Generalized Linear Models. 2nd ed. London: Chapman and 
Hall, 1999. 


Problems 


14.1. 


14.2. 


143. 


14.4. 


*14.5. 


14.6. 


*14.7. 


A student stated: “Т fail to see why the response function needs to be constrained between 0 

and 1 when the response variable is binary and has a Bernoulli distribution. The fit to 0, 1 data 

will take care of this problem for any response function." Comment. 

Since the logit transformation (14.18) linearizes the logistic response function, why can't this 

transformation be used on the individual responses Y; and a linear response function then 

fitted? Explain. 

If the true response function is J-shaped when the response variable is binary, would the use 

of the logistic response function be appropriate? Explain. 

а. Plot the logistic mean response function (14.16) when Во = —25 and f, = .2. 

b. For what value of Xis the mean response equal to .5? 

c. Find the odds when- X = 150, when X = 151, and the ratio of the odds when X = 151 to 

the odds when X = 150. Is this odds ratio equal to exp(f) as it should be? 

. Plot the logistic mean response function (14.16) when Во — 20 and f, = —.2. 

. For what value of X is the mean response equal to .5? 

c. Find the odds when X — 125, when X — 126, and the ratio of the odds when X — 126 to 
the odds when X = 125. Is the odds ratio equal to exp(;) as it should be? 

a. Plot the probit mean response function (14.12) for fj = —25 and pr = .2. How does this 
function compare to the logistic mean response function in part (a) of Problem 14.4? 

b. For what value of X is the mean response equal to .5? 

Annual dues. The board of directors of a professional association conducted a random sample 

survey of 30 members to assess the effects of several possible amounts of dues increase. The 

sample results follow. X denotes the dollar increase in annual dues posited in the survey 

interview, and Y = f if the interviewee indicated that the membership will not be renewed at 

that amount of dues increase and 0 if the membership will be renewed. 


ст в 


i: 1 2 3 ea 28 29 30 
X: 30 30 30 aa 49 50 50 
Үг: 0 1 0 m 0 1 1 


Logistic regression model (14.20) is assumed to be appropriate. 

a. Find the maximum likelihood estimates of Во and В]. State the fitted response function. 

b. Obtain a scatter plot of the data with both the fitted logistic response function from part 
(a) and a lowess smooth superimposed. Does the fitted logistic response function appear 
to fit well? 

с. Obtain exp(b,) and interpret this number. 

d. Whatisthe estimated probability that association members will not renew their membership 
if the dues are increased by $40? 

e. Estimate the amount of dues increase for which 75 percent of the members are expected 
not to renew their association membership. 


626 Part Three Nonlinear Regression 


14.8. 


14.9. 


14.10. 


*14.11. 


Refer to Annual dues Problem 14.7. 


a. Fit a probit mean response function (14.12) to the data. Qualitatively compare the fit here 
with the logistic fit obtained in part (a) of Problem 14.7. What do you conclude? 

b. Fit a complimentary log-log mean response function (14.19) to the data. Qualitative] 
compare the fit here with the logistic fit obtained in part (a) of Problem 14.7. What do you 
conclude? 


Performance ability. A psychologist conducted a study to examine the nature of the Telation 
if any, between an employee's emotional stability (X) and the employee's ability to perform 
in a task group (Y). Emotional stability was measured by a written test for which the higher 
the score, the greater is the emotional stability. Ability to perform in a task group (Y=lif 
able, Y = O if unable) was evaluated by the supervisor. The results for 27 employees were: 


i 1 2 3 25 26 27 
Xi 474 432 453 562 506 600 
Yi 0 0 0 1 0 1 


Logistic regression model (14.20) is assumed to be appropriate. 

a. Find the maximum likelihood estimates of Во and f. State the fitted response function, 

b. Obtain a scatter plot of the data with both the fitted logistic response function from part (a) 
and a lowess smooth superimposed. Does the fitted logistic response function appear to fit 
well? 

c. Obtain exp(b,) and interpret this number. 

d. What is the estimated probability that employees with an emotional stability test score of 

550 will be able to perform in a task group? 

Estimate the emotional stability test score for which 70 percent of the employees with this 

test score are expected to be able to perform in a task group. 


© 


Refer to Performance ability Problem 14.9. 


a. Fit a probit mean response function (14.12) to the data. Qualitatively compare the fit here 
with the logistic fit obtained in part (a) of Problem 14.9. What do you conclude? 

b. Fit a complementary log-log mean response function (14.19) to the data. Qualitatively 
compare the fit here with the logistic fit obtained in part (a) of Problem 14.9. What do you 
conclude? 


Bottle return. A carefully controlled experiment was conducted to study the effect of the 
size of the deposit level on the likelihood that a returnable one-liter soft-drink bottle will be 
returned. A bottle return was scored 1. and no return was scored 0). The data to follow show 
the number of bottles that were returned (Y ;) out of 500 sold (у) at each of six deposit levels 
(X у. in cents): 


* 
j: 1 2 3 4 5 6 
Deposit level X;: 2 5 10 20 25 30 
Number sold n;: 500 500 500 500 500 500 
Number returned Y ;: 72 103 170 296 406 449 


An analyst believes that logistic regression mode! (14.20) is appropriate for studying the 

relation between size of deposit and the probability a bottle will be returned. 

a. Plot the estimated proportions p; = Y }/п у against X ;. Does the plot support the analyst's 
belief that the logistic response function is appropriate? 

b. Find the maximum likelihood estimates of Во and f. State the fitted response function. 


14.12. 


14.13. 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 627 


c. Obtain a scatter plot of the data with the estimated proportions from part (a), and super- 
impose the fitted logistic response function from part (b). Does the fitted logistic response 
function appear to fit well? 

d. Obtain exp(b;) and interpret this number. 

e. What is the estimated probability that a bottle will be returned when the deposit is 15 cents? 


f. Estimate the amount of deposit for which 75 percent of the bottles are expected to be 
returned. 


Toxicity experiment. In an experiment testing the effect of a toxic substance, 1,500 experi- 
mental insects were divided at random into six groups of 250 each. The insects in each group 
were exposed to a fixed dose of the toxic substance. A day later, each insect was observed. 
Death from exposure was scored 1, and survival was scored 0. The results are shown below; 
Х denotes the dose level (on a logarithmic scale) administered to the insects in group j and 
Ү ; denotes the number of insects that died out of the 250 (n;) in the group. 


j: 1 2 3 4 5 6 
Xj: 1 2 3 4 5 6 
nj: 250 250 250 250 250 250 
Y 28 53 93 126 172 197 


Logistic regression model (14.20) is assumed to be appropriate. 

а. Plot the estimated proportions p; = Y ;/n; against X;. Does the plot support ће analyst's 
belief that the logistic response function is appropriate? 

b. Find the maximum likelihood estimates of Во and fj. State the fitted response function. 

c. Obtain a scatter plot of the data with the estimated proportions from part (a), and super- 
impose the fitted logistic response function from part (b). Does the fitted logistic response 
function appear to fit well? 

d. Obtain exp(b,) and interpret this number. 

e. What is the estimated probability that an insect dies when the dose level is X = 3.5? 

f. What is the estimated median lethal dose—that is, the dose for which 50 percent of the 
experimental insects are expected to die? 


Car purchase. A marketing research firm was engaged by an automobile manufacturer to 
conduct a pilot study to examine the feasibility of using logistic regression for ascertaining 
the likelihood that a family will purchase a new car during the next year. A random sample of 
33 suburban families was selected. Data on annual family income (Х|, in thousand dollars) 
and the current age of the oldest family automobile (X5, in years) were obtained. A follow- 
up interview conducted 12 months later was used to determine whether the family actually 
purchased a new car (Y — 1) or did not purchase a new car (Y — 0) during the year. 


i: 1 2 3 t 31 32 33 
Xn: 32 45 60 Mo 21 32 17 
Xiz: 3 2 2 EE 3 5 1 

Y;: 0 0 1 a 0 1 0 


Multiple logistic regression model (14.41) with two predictor variables in first-order terms is 
assumed to be appropriate. 

a. Find the maximum likelihood estimates of Во, 61, and f2. State the fitted response function. 
b. Obtain exp(bı) and exp(b2) and interpret these numbers. 


c. What is the estimated probability that a family with annual income of $50 thousand and 
an oldest car of 3 years will purchase a new car next year? 


628 PartThree Nonlinear Regression 


*14.14. 


*14.15. 


14.16. 


*14.17. 


$ 


Flu shots. A local health clinic sent fliers to its clients to encourage everyone, but Especial j. 
older persons at high risk of complications, to get a flu shot in time for protection а y. 
expected flu epidemic. In a pilot follow-up study, 159 clients were randomly selected 
whether they actually received a flu shot. A client who received a flu shot was coded y =} 
and a client who did not receive a flu shot was coded Y = 0). In addition. data were Collected 
on their age (X,) and their health awareness. The latter data were combined into а health 
awareness index (Хэ), for which higher values indicate greater awareness. Also included in 


the data was client gender, where males were coded X; = 1 and females were coded X; 


=Q 
i: 1 2 3 mn 157 158 159 
Хи: 59 61 82 S 76 68 73 
Xiz: 52 55 51 22 32 56 
Xiz: 0 1 0 oe 1 0 1 
ү: 0 0 1 1 1 1 


Multiple logistic regression model (14.41) with three predictor variables in first-order terms 
is assumed to be appropriate. 


a. Find the maximum likelihood estimates of Во, £1, 62, and f. State the fitted response 
function. 


b. Obtain exp(b;), exp(2), and exp(b3). Interpret these numbers. 


c. What is the estimated probability that male clients aged 55 with a health awareness index 
of 60 will receive a flu shot? 


Refer to Annual dues Problem 14.7. Assume that the fitted model is appropriate and that 

large-sample inferences are applicable. 

a. Obtain an approximate 90 percent confidence interval for exp(f). Interpret your interval, 

b. Conduct a Wald test to determine whether dollar increase in dues ( X) is related to the 
probability of membership renewal; use œ = .10. State the alternatives, decision rule, and 
conclusion. What is the approximate P-value of the test? 

c. Conduct a liKelihood ratio test to determine whether dollar increase in dues (X) is related 
to the probability of membership renewal; use о = .10. State the full and reduced models, 
decision rule, and conclusion. What is the approximate P-value of the test? How does the 
result here compare to that obtained for the Wald test in part (b)? 


Refer to Performance ability Problem 14.9. Assume that the fitted model is appropriate and 

that large-sample inferences are applicable. 

a. Obtain an approximate 95 percent confidence interval for exp( 61). Interpret your interval. 

b. Conduct a Wald test to determine whether employee's emotional stability (X) is related 
to the probability that the employee will be able to perform in a task group: use g = .05. 
State the alternatives, decision rule, and conclusion. What is the approximate P-value of 
the test? 

c. Conduct a likelihood ratio test to determine whether employee's emotional stability (X) 
is related to the probability that the employee will be able to perform in a task group; 
use o — .05. State the full and reduced models, decision rule, and conclusion. What is the 
approximate P-value of the test? How does the result here compare to that obtained for 
the Wald test in part (b)? 


Refer to Bottle return Problem 14.11. Assume that the fitted model is appropriate and that 
large-sample inferences are applicable. 


a. Obtain an approximate 95 percent confidence interval for fj. Convert this confidence 
interval into one for the odds ratio. Interpret this latter interval. 


Bainst an -- 
andaskeg Р 


m 


“he м 


14.18. 


14.19. 


*14.20. 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 629 


. Conduct a Wald test to determine whether deposit level (X) is related to the probability 


that a bottle is returned; use a = .05. State the alternatives, decision rule, and conclusion. 
What is the approximate P-value of the test? 

Conduct a likelihood ratio test to determine whether deposit level (X) is related to the 
probability that a bottle is returned; use œ = .05. State the full and reduced models, 
decision rule, and conclusion. What is the approximate P-value of the test? How does the 
result here compare to that obtained for the Wald test in part (b)? 


Refer to Toxicity experiment Problem 14.12. Assume that the fitted model is appropriate and 
that large-sample inferences are applicable. 


a. 


Obtain an approximate 99 percent confidence interval for f,. Convert this confidence 
interval into one for the odds ratio. Interpret this latter interval. 

Conduct a Wald test to determine whether dose level (X) is related to the probability that 
an insect dies; use œ = .01. State the alternatives, decision rule, and conclusion. What is 
the approximate P-value of the test? 


. Conduct a likelihood ratio test to determine whether dose level (X) is related to the prob- 


ability that an insect dies; use œ = .01. State the full and reduced models, decision rule, 
and conclusion. What is the approximate P-value of the test? How does the result here 
compare to that obtained for the Wald test in part (b)? 


Refer to Car purchase Problem 14.13. Assume that the fitted model is appropriate and that 
large-sample inferences are applicable. 


a. 


Obtain joint confidence intervals for the family income odds ratio exp(20f;) for families 
whose incomes differ by 20 thousand dollars and for the age of the oldest family automobile 
odds ratio ехр(282) for families whose oldest automobiles differ in age by 2 years, with 
family confidence coefficient of approximately .90. Interpret your intervals. 


. Use the Wald test to determine whether X5, age of oldest family automobile, can be 


dropped from the regression model; use o — .05. State the alternatives, decision rule, and 
conclusion. What is the approximate P-value of the test? 


. Use the likelihood ratio test to determine whether X», age of oldest family automobile, can 


be dropped from the regression model; use œ = .05. State the full and reduced models, 
decision rule, and conclusion. What is the approximate P-value of the test? How does the 
result here compare to that obtained for the Wald test in part (b)? 

Use the likelihood ratio test to determine whether the following three second-order terms, 
the square of annual family income, the square of age of oldest automobile, and the two- 
factor interaction effect between annual family income and age of oldest automobile, 
should be added simultaneously to the regression model containing family income and age 
of oldest automobile as first-order terms; use œ = .05. State the full and reduced models, 
decision rule, and conclusion. What is the approximate P-value of the test? 


Refer to Flu shots Problem 14.14. 


a. 


Obtain joint confidence intervals for the age odds ratio exp(30f) for male clients whose 
ages differ by 30 years and for the health awareness index odds ratio exp(25f5) for male 
clients whose health awareness index differs by 25, with family confidence coefficient of 
approximately .90. Interpret your intervals. 


. Use the Wald test to determine whether X3, client gender, can be dropped from the regres- 


sion model; use o — .05. State the alternatives, decision rule, and conclusion. What is the 
approximate P-value of the test? 


. Use the likelihood ratio test to determine whether Хз, client gender, can be dropped from 


the regression model; use œ = .05. State the full and reduced models, decision rule, and 


630 Part Three Nonlinear Regression 


14.21. 


* 14.22. 


*14.23. 


14.24. 


*14.25. 


14.26. 


conclusion. What is the approximate P-value of the test? How does the resulthere compar 

to that obtained for the Wald test in part (b)? е 

d. Usc the likelihood ratio test to deiermine whether the following three second-order terms 
the square of age. the square of healih awareness index, and ihe two-factor interaction effect 
betwecn age and health awareness index, should be added simultaneously to the Tegres. 
sion model containing age and health awareness index as first-order terms; use а= (5 
State the alternatives, full and reduced models, decision rule, and conclusion. What is the 
approximate P-value of the test? 

Refer to Car purchase Problem 14.13 where the pool of predictors consists of all first-order 

terms and all second-order terms in annual family income and age of oldest family automobile, 

a. Use forward selection to decide which predictor variables enter into the regression model, 
Control theo risk at .10 ateach stage. Which variables are entered into the regression model? 

b. Use backward elimination to decide which predictor variables can be dropped from the 
regression model. Control the @ risk at .10 at each stage. Which variables are retained? 
How does this compare to your results in part (a)? 

c. Find the best model according to the A/C) criterion. How does this compare to your results 
in parts (a) and (b)? 

d. Find the best model according tothe SBC, criterion. How does this compare to your results 
in parts (a). (b) and (c)? 

Refer to Flu shots Problem 14.14 where the pool of predictors consists of all first-order terms 

and all second-order terms in age and health awareness index. 


а. Use forward selection to decide which predictor variables enter into the regression model, 
Control theo risk at. VO ateach stage. Which variables are entered into the regression model? 

b. Use backward elimination to decide which predictor variables can be dropped from the 
regression model. Control the @ risk at .10 at each stage. Which variables are retained? 
How does this compare to your results in part (a)? 

c. Find the best model according to the A/C, criterion. How does this compare to your results 
in parts (a) and (b)? 

d. Find the best model according to ће SBC, criterion. How does this compare to your results 
in parts (а), (b) and (c)? 

Refer to Bottle return Problem 14.11. Use the groups given there to conduct a chi-square 

goodness of fit test of the appropriateness of logistic regression model (14.20). Control the 

risk of a Type 1 error at .01. State the alternatives, decision rule. and conclusion. 

Referto Toxicity experiment Problem 14.12. Use the groups given there to conduct adeviance 

goodness of fit test of the appropriateness of logistic regression model (14.20). Control the 

risk of a Type І error at .01. State the aliernatives, decision rule. and conclusion. 

Refer to Annual dues Problem 14.7. 

a. To assess the appropriateness of the logistic regression function, form three groups of 
10 cases each according to their fitted logit values 7’. Plot the estimated proportions pj 
against the midpoints of ће #’ intervals, 15 the plot consistent with a response function of 
monotonic sigmoidal shape? Explain. 

b. Obtain the studentized Pearson residuals (14.81) and plot them against the estimated model 
probabilitics with a lowess smooth superimposed. What does the plot suggest about the 
adequacy of the fit of the logistic regression model? 

Refer to Performance ability Problem 14.9. 

а. То assess the appropriateness of the logistic regression function. form three groups of 
nine cases each according to their fitted logit values 7’. Plot the estimated proportions Pj 


1427. 


*14.28. 


*14.29. 


14.30. 


14.31. 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 631 


against the midpoints of the #’ intervals. Is the plot consistent with a response function of 
monotonic sigmoidal shape? Explain. 

Obtain the deviance residuals (14.83) and plot them against the estimated model probabil- 
ities with a lowess smooth superimposed. What does the plot suggest about the adequacy 
of the fit of the logistic regression model? 


Refer to Car purchase Problems 14.13 and 14.21. 


a. 


To assess the appropriateness of the logistic regression model obtained in part (d) of 
Problem 14.21, form three groups of 11 cases each according to their fitted logit values 
i’. Plot the estimated proportions p; against the midpoints of the #’ intervals. Is the plot 
consistent with a response function of monotonic sigmoidal shape? Explain. 

Obtain the studentized Pearson residuals (14.81) and plot them against the estimated model 
probabilities with a lowess smooth superimposed. What does the plot suggest about the 
adequacy of the fit of the logistic regression model? 


Refer to Flu shots Problems 14.14 and 14.22. 


a. 


To assess the appropriateness of the logistic regression model obtained in part (d) of 
Problem 14.22, form 8 groups of approximately 20 cases each according to their fitted 
logit values fi^. Plot the estimated proportions p; against the midpoints of the 7' in- 
tervals. Is the plot consistent with a response function of monotonic sigmoidal shape? 
Explain. 

Using the groups formed in part (а), conduct a Hosmer-Lemeshow goodness of fit test for 
the appropriateness of the logistic regression function; use œ = .05. State the alternatives, 
decision rule, and conclusions. What is the P-value of the test? 

Obtain the deviance residuals (14.83) and plot them against the estimated model probabil- 
ities with a lowess smooth superimposed. What does the plot suggest about the adequacy 
of the fit of the logistic regression model? 


Refer to Annual dues Problem 14.7. 


a. 


For the logistic regression model fit in Problem 14-78, prepare an index plot of the diag- 
onal elements of the estimated hat matrix (14.80). Use the plot to identify any outlying 
X observations. 

To assess the influence of individual observations, obtain the delta chi-square statistic 
(14.85), the delta deviance statistic (14.86), and Cook's distance (14.87) for each obser- 
vation. Plot each of these in separate index plots and identify any influential observations. 
Summarize your findings. 


Refer to Performance ability Problem 14.9. 


a. 


For the logistic regression fit in Problem 14.9a, prepare an index plot of the diagonal 
elements of the estimated hat matrix (14.80). Use the plot to identify any outlying X 
Observations. 

To assess the influence of individual observations, obtain the delta chi-square statistic 
(14.85), the delta deviance statistic (14.86), and Cook's distance (14.87) for each obser- 
vation. Plot each of these in separate index plots and identify any influential observations. 
Summarize your findings. 


Refer to Car Purchase Problems 14.13 and 14.21. 


a. 


For the logistic regression model obtained in part (d) of Problem 14.21, prepare an index 
plot of the diagonal elements of the estimated hat matrix (14.80). Use the plot to identify 
any outlying X observations. 


. То assess the influence of individual observations, obtain the delta chi-square statis- 


tic (14.85), the delta deviance statistic (14.86), and Cook's distance (14.87) for each 


632 Part Three Nonlinear Regression Я 


observation, Plot each of these in separate index plots and identify 


i any influent) ob: 
servations. Summarize your findings. i 


*14.32. Refer to Flu shots Problem 14,14, : 


* 14.33, 


14.34. 


14.35, 


а, 


For the logistic regression fit in Problem 14.14a. prepare an index plot of the di 
elements of the estimated hat matrix (14.80). Use the plot to identify 
observations, 


agona] 
any outlying у 


To assess the influence of individual observations, obtain the delta chì-square statisti 
(14.85), the delta deviance statistic (14.86). and Cook's distance (14.87) for each ds 
vation, Plot each of these in separate index plots and identify any influential observati 


| ү ons, С. 
Summarize your findings, z 


Refer to Annual dues Problem 14.7. 


а. 


d. 


Based on the fitted regression function in Problem 14.72, obtain an approximate 90 percent 
confidence interval for the mean response zr, for a dues increase of X, = $40, 

A prediciion rule is to be developed, based on the fitted regression function in Prob. 
lem 14.7a, Based оп the sample cases. find the total error rate, the error rate for renewers, 
and the error rate for nonrenewers for the following cutoffs: .40, .45. .50, 55, .60, 
Based on your results in part (b), which cutoff minimizes the total error rate? Are the 
etror rates for renewers and nonrenewers fairly balanced at this cutoff? Obtain the area 
under the ROC curve to assess the model's predictive power here. What do you 
conclude? 

How can you establish whether the observed total error rate for the best cutoff in part (b) is 
a reliable indicator of the predictive ability of the fitted regression function and the chosen 
cutoff? 


Refer to Performance ability Problem 14,9. 


a. 


d. 


Using the fitted regression function in Problem 14.9a, obtain joint confidence intervals for 
the mean response л for persons with emotional stability test scores X, = 550 and 625, 
respectively, with an approximate 90 percent family confidence coefficient. Interpret your 
intervals, 

A prediction rule, based on the fitted regression function in Problem 14.9a, is to be de- 
veloped, For the sample cases, find the total error rate, the error rate for employees able 
to perform in a task group, and the error vate for employees not able to perform for the 
following cutoffs: .325. .425. .525, .625. 

On the basis of your results in part (b). which cutoff minimizes the total error rate? Are 
the error rates for employees able to perform in a task group and for employees not able to 
perform fairly balanced at this cutoff? Obtain the area under the ROC curve to assess the 
model's predictive power here. What do you conclude? 

How can you establish whether the observed total error rate for the best cutoff in part (c) is 
a reliable indicator of the predictive ability of the fitted regression function and the chosen 
cutoff? 


Refer to Bottle return Problem 14.11. 


a. 


For the fitted regression function in Problem 14.1 fa. obtain an approximate 95 percent 
confidence interval for the probability of a purchase for deposit X, = 15 cents. Interpret 
your interval. 

A prediction rule is to be developed. based on the fitted regression function in Prob- 
lem 14.1 la. For the sample cases, find the total error rate, the error rate for purchasers, and 
the error rate for nonpurchasers for the following cutoffs: . 150, .300. .450. .600, .750. 


с. 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 633 


According to your results in part (b), which cutoff minimizes the total error rate? Are 
the error rates for purchasers and nonpurchasers fairly balanced at this cutoff? Obtain 
the area under the ROC curve to assess the model’s predictive power here. What do you 
conclude? 


. How can you establish whether the observed total error rate for the best cutoff in part (c) is 


a reliable indicator of the predictive ability of the fitted regression function and the chosen 
cutoff? 


*14.36. Refer to Flu shots Problem 14.14. 


a. 


On the basis of the fitted regression function in Problem 14.14a, obtain a confidence 
interval for the mean response ль for a female whose age is 65 and whose health awareness 
index is 50, with an approximate 90 percent family confidence coefficient. Interpret your 
intervals. 


. A prediction rule is to be based on the fitted regression function in Problem 14.14a. For 


the sample cases, find the total error rate, the error rate for clients receiving the flu shot, 
and the error rate for clients not receiving the flu shot for the following cutoffs: .05, .10, 
‚15, .20. 


. Based on your results in part (b), which cutoff minimizes the total error rate? Are the error 


rates for clients receiving the flu shot and for clients not receiving the flu shot fairly balanced 
at this cutoff? Obtain the area under the ROC curve to assess the model’s predictive power 
here. What do you conclude? 


. How can you establish whether the observed total error rate for the best cutoff in part (c) is 


a reliable indicator of the predictive ability of the fitted regression function and the chosen 
cutoff? 


14.37. Polytomous logistic regression extends the binary response outcome to a multicategory re- 
sponse outcome for either nominal level or ordinal level data. Discuss the advantages and 
disadvantages of treating multicategory ordinal level outcomes as a series of binary logistic 
regression models, as a nominal level polytomous regression model, or as a proportional odds 
model. 


Refer to Airfreight breakage Problem 1.21. 


*14.38. 


14.39. 


а. 


Fit the Poisson regression model (14.113) with the response function p(X, В) = 
ехр(% + f; X). State the estimated regression coefficients, their estimated standard devi- 
ations, and the estimated response function. 


. Obtain the deviance residuals and present them in an index plot. Do there appear to be any 


outlying cases? 


. Estimate the mean number of ampules broken when X = 0, 1, 2,3. Compare these estimates 


with those obtained by means of the fitted linear regression function in Problem 1.21a. 


. Plot the Poisson and linear regression functions, together with the data. Which regression 


function appears to be a better fit here? Discuss. 


. Management wishes to estimate the probability that 10 or fewer ampules are broken when 


there is no transfer of the shipment. Use the fitted Poisson regression function to obtain 
this estimate. 


. Obtain an approximate 95 percent confidence interval for Ві. Interpret your interval 


estimate. 


Geriatric study. A researcher in geriatrics designed a prospective study to investigate the 
effects of two interventions on the frequency of falls. One hundred subjects were randomly 
assigned to one of the two interventions: education only (X, — 0) and education plus aerobic 
exercise training (X, — 1). Subjects wereatleast 65 years of age and in reasonably goodhealth. 


634 Part Тһгее Nonlinear Regression 


Exercises 


14.40. 
14.41. 
14.42. 
14.43. 


14.44. 


14.45. 


14.46. 


Three variables considered to be important as control variables were gender ( X 20= female: 
| = male), a balance index (Хз). and a strength index (X4). The higher the balance index the 
more stable is the subject: and the higher the strength index, the stronger is the subject, Each 
subject kept a diary recording the number of falls (Y) during ihe six months of the Study. The 
data fotlow: 


Number of 
Subject Falls Intervention Gender Balance Index Strength Index 

1 Y; Xn Хр Хз Хра 

1 1 1 0 45 70 

2 1 1 0 62 66 

3 2 1 1 43 64 

98 4 0 0 69 48 

99 4 0 1 50 52 
100 2 0 0 37 56 


a. Fit the Poisson regression model (14.113) with the response function p(X, ps 
exp(Bo + Bi X1 + o X» + B3 X5 + B4 Xa). State the estimated regression coefficients, their 
estimated standard deviations, and the estimated response function. 

b. Obtain the deviance residuals and present them in an index plot, Do there appear to be any 
outlying cases? 

c. Assuming that the fitted model is appropriate, use the likelihood ratio test to determine 
whether gender (Хэ) can be dropped from the model: control o at .05. State the full and 
reduced models, decision rule. and conclusion, What is the P-value of the test. 

d. For the fitted model containing only X,, X3. and X, in first-order terms, obtain an ap- 
proximate 95 percent confidence interval for £j. Interpret your confidence interval. Does 
aerobic exercise reduce the frequency of falls when controlling for balance and strength? 


Show the equivalence of (14.16) and (14.17). 

Derive (14.34) from (14.26). 

Derive (14.182). using (14.16) and (14.18). 

(Calculus needed.) Maximum likelihood estimation theory states that the estimated large- 
sample variance-covariance matrix for maximum likelihood estimators is given by the inverse 
of the information matrix, the elements of which are the negatives of the expected values of the 
second-order partial derivatives of the logarithm of the likelihood function evaluated at B = b: 


Е д? log, L(B) ) | n 
of; dP, =b 


Show that this matrix simplifies to (14.51) for logistic regression, Consider the case where 
p-tst. 

(Calculus needed.) Estimate the approximate variance-covariance matrix of the estimated re- 
gression coefficients for the programming task example in Table 14.1a. using (14.51), and 
verify the estimated standard deviations in Table 14. 1b. 

Show that the logistic response function (13.10) reduces to the response function in ( 14.20) 
when the Y; are independent Bernoulli random variables with E(Y;] = л;. 

Consider the multiple logistic regression model with X'8 = By + Bi X, + BX: + £iXiX- 
Derive an expression for the odds ratio for X. Does exp(f) have the same meaning here a5 
for a regression mode! containing no interaction term? 


Projects 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 635 


14.47. A Bernoulli response Y; has expected value: 


E(Y,] 27; 21— apl- С) 
1 


Show that the link function here is the complementary log-log transformation of л;, namely, 
log, [— log, (1 — л;)]. 


14.48. Refer to the Disease outbreak data set in Appendix C.10. Savings account status is the 
response variable and age, socioeconomic status, and city sector are the predictor variables. 
Cases 1—98 are to be utilized for developing the logistic regression model. 


a. 


Fit logistic regression model (14.41) containing the predictor variables in first-order terms 
and interaction terms fof all pairs of predictor variables. State the fitted response function. 


. Use the likelihood ratio test to determine whether all interaction terms can be dropped 


from the regression model; use a = .01. State the alternatives, full and reduced models, 
decision rule, and conclusion. What is the approximate P-value of the test? 


. Forlogistic regression model in part (a), use backward elimination to decide which predictor 


variables can be dropped from the regression model. Control the o risk at .05 at each stage. 
Which variables are retained in the regression model? 


14.49. Refer to the Disease outbreak data set in Appendix C.10 and Project 14.48. Logistic regression 
model (14.41) with predictor variables age and socioeconomic status in first-order terms is to 
be further evaluated. 


a. 


Conduct the Hosmer-Lemeshow goodness of fit test for the appropriateness of the logistic 
regression function by forming five groups of approximately 20 cases each; use o = .05. 
State the alternatives, decision rule, and conclusion. What is the approximate P-value of 
the test? 


. Obtain the deviance residuals and plot them against the estimated probabilities with a 


lowess smooth superimposed. What does the plot suggest about the adequacy of the fit of 
the logistic regression model? 


. Prepare an index plot of the diagonal elements of the estimated hat matrix (14.80). Use the 


plot to identify any outlying X observations. 


. То assess the influence of individual observations, obtain the delta chi-square statistic 


(14.85), the delta deviance statistic (14.86), and Cook's distance (14.87) for each obser- 
vation. Plot each of these in separate index plots and identify any influential observations. 
Summarize your findings. 


. Constructahalf-normal probability plot of the absolute deviance residuals and superimpose 


a simulated envelope. Are any cases outlying? Does the logistic model appear to be a good 
fit? Discuss. 


. To predict savings account status, you must identify the optimal cutoff. On the basis of the 


sample cases, find the total error rate, the error rate for persons with a savings account, 
and the error rate for persons with no savings account for the following cutoffs: .45, .50, 
.55, .60. Which of the cutoffs minimizes the total error rate? Are the two error rates for 
persons with and without savings accounts fairly balanced at this cutoff? Obtain the area 
under the ROC curve to assess the model’s predictive power here. What do you conclude? 


14.50. Refer to the Disease outbreak data set in Appendix C.10 and Project 14.49. The regression 
model identified in Project 14.49 is to be validated using cases 99-196. 


636 Рай Three Nonlinear Regression à 


a. 


i 
alidation: 
alidation- 
5 data Set u 


Use the rule obtained in Project 14.49f to make a prediction for each of the holdout " 
cases. What are the total and the two component prediction error rates for the y 
data sct? How do these error rates compare with those for the model-buildin 
Project 14.49f? 

Combine the model-building and validation data sets and fit the mode} identified ; 
Project 14.49 to the combined data. Are the estimated coefficients and their Rin 
standard deviations similar to those obtained for the model-building data set? Should th : 
be? Comment. 9. 


. Based оп the fitted regression mode! in part (b). obtain joint 90 percent confidence intervals 


for the odds ratios for age and socioeconomic status. Interpret your intervals, 


14.51. Refer to the SENIC data set in Appendix C.!. Medica! school affiliation is the response 
variable, to be coded Y = ! if medica! school affiliation and Y —0 if no medical school 
affiliation. The poo! of potential predictor variables includes age, routine chest X-ray ratio,. 
average daily census, and number of nurses. Af! 113 cases are to be used in developing the 
logistic regression model. 


14.52. 


a. 


d. 


Fit logistic regression mode! (14.41) containing all predictor variables in the pool in first 
order terms and interaction terms for al! pairs of predictor variables. State the fitted response 
function. 


. Test whether all interaction terms can be dropped from the regression model; use о = 05. 


State the full and reduced models, decision rule, and conclusion. What is the approximate 
P-value of the test? 


. For logistic regression mode! (14.41) containing the predictor variables in first-order terms 


only, use forward stepwise regression to decide which predictor variables can be retained 
in the regression model. Control the o risk at . 10 at each stage. Which variables should be 
retained in the regression model? 

For logistic regression mode! (14.41) containing the predictor variables in first-order terms 
only. identify the best subset models using the A/C, criterion and the SBC, criterion. Does 
the use of these two criteria lead to the same model? Are either of the models identified 
the same as that found in part (c)? 


Refer to the SENIC data set in Appendix C.! and Project 14.51. Logistic regression 
model ( 14.41) with predictor variables age and average daily census in first-order terms is to 
be further evaluated. 


a, 


d. 


Conduct Hosmer-Lemshow goodness of fit test for the appropriateness of the logistic re- 
gression function by forming five groups of approximately 23 cases each: use a = .05. State 
the alternatives. decision rule, and conclusion. What is the approximate P-value of the test? 
Obtain the deviance residuals and plot them against the estimated probabilities with a 
{owess smooth superimposed. What does the plot suggest about the adequacy of the fit of 
the logistic regression model? 

Construct a half-normal probability plot of the absolue deviance residuals and superin 
pose a simulated envelope. Are any cases outlying? Does the logistic model appear to be 
а good fit? Discuss. 

Prepare an index plot of the diagonal! elements of the estimated hat matrix (14.80). Use the 
plot to identify any outlying X observations. 

To assess the influence of individual observations, obtain the delta chi-square statistic 
(14.85), the delta deviance statistic (14.86). and Cook's distance (14.87) for each obser- 
vation, Plot each of these in separate index plots and identify any influential observations. 
Summarize your findings. 


мез 


ar 


14.53. 


14.54. 


14.55. 


14.56. 


14.57. 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 637 


f. 'To predict medical school affiliation, you must identify the optimal cutoff. For the sample 
cases, find the total error rate, the error rate for hospitals with medical school affiliation, 
and the error rate for hospitals without medical school affiliation for the following cutoffs: 
30, .40, .50, .60. Which of the cutoffs minimizes the total error rate? Are the two error 
rates for hospitals with and without medical school affiliation fairly balanced at this cutoff? 
Obtain the area under the ROC curve to assess the model’s predictive power here. What 
do you conclude? 


g. Estimate by means of an approximate 90 percent confidence interval the odds of a hospital 
having medical school affiliation for hospitals with average age of patients of 55 years and 
average daily census of 500 patients. 


Refer to Annual dues Problem 14.7. Obtain a simulated envelope and superimpose it on the 
half-normal probability plot of the absolute deviance residuals. Are there any indications that 
the fitted model is not appropriate? Are there any outlying cases? Discuss. 


Refer to Annual dues Problem 14-7. In order to assess the appropriateness of large-sample 
inferences here, employ the following parametric bootstrap procedure: Foreach of the 30 cases, 
generate a Bernoulli outcome (0, 1), using the estimated probability 7; for the original X; 
level according to the fitted model. Fit the logistic regression model to the bootstrap sample 
and obtain the bootstrap estimates bj and by. Repeat this procedure 500 times. Compute 
the mean and standard deviation of the 500 bootstrap estimates bë, and do the same for pr. 
Plot separate histograms of the bootstrap distributions of bg and by. Are these distributions 
approximately normal? Compare the point estimates Ро and b; and their estimated standard 
deviations obtained in the original fit to the means and standard deviations of the bootstrap 
distributions. What do you conclude about the appropriateness of large-sample inferences 
here? Discuss. А 

Refer to Car purchase Problem 14.13. Obtain a simulated envelope and superimpose it on 
the half-normal probability plot of the absolute deviance residuals. Are there any indications 
that the fitted model is not appropriate? Are there any outlying cases? Discuss. 

Refer to Car purchase Problem 14.13. In order to assess the appropriateness of large-sample 
inferences here, employ the following parametric bootstrapping procedure: For each of the 
33 cases, generate a Bernoulli outcome (0, 1), using the estimated probability 7; for the original 
levels of the predictor variables according to the fitted model. Fit the logistic regression model 
to the bootstrap sample. Repeat this procedure 500 times. Compute the mean and standard 
deviation of the 500 bootstrap estimates Рї, and do the same for 55. Plot separate histograms 
of the bootstrap distributions of bj and bž. Are these distributions approximately normal? 
Compare the point estimates b, and b; and their estimated standard deviations obtained in the 
original fit to the means and standard deviations of the bootstrap distributions. What do you 
conclude about the appropriateness of large-sample inferences here? Discuss. 

Refer to the SENIC дага set in Appendix C.1. Region is the nominal level response variable 
coded 1 = NE, 2 = NC, 3 = S, and 4 = W. The pool of potential predictor variables includes age, 
routine chest X-ray ratio, number of beds, medical school affiliation, average daily census, 
number of nurses, and available facilities and services. All 113 hospitals are to be used in 
developing the polytomous logistic regression model. 


a. Fit polytomous regression model (14.99) using response variable region with 1 — NE as 
the referent category. Which predictors appear to be most important? Ínterpret the results. 

b. Conduct a likelihood ratio test to determine if the three parameters corresponding to age 
can be dropped from the nominal logistic regression model. Control œ at .05. State the full 
and reduced models, decision rule, and conclusion. What is the approximate P-value of 
the test? 


638 Part Three Nonlinear Regression 


14.58. 


14.59. 


c. Conduct a likelihood ratio test to determine if all parameters corresponding to age 
available facilities and services can be dropped from the nominal logistic regression id 
Contro! o at 05. State the full and reduced models, decision rule, and conclusion, What 
the approximate P-value of the test? 5 

d. For the full model in part (a). carry out separate binary logistic regressions for сасһ of the 
three comparisons with the refcrent category, as described at the top of page 612. How do 
the slope coefficients compare to those obtained in part (a). 


e 


For each of the separate binary logistic regressions carried out in part (d), Obtain the 
deviance residuals and plot them against the estimated probabilities with a lowesg Smooth 
superimposed. What do the plots suggest about the adequacy of thc lit of the binary logistic 
regression models? 

f. For each of the separate binary logistic regressions carried out in part (d), obtain the delta 
chi-square statistic ( 14.85), the delta deviance statistic ( 14.86), and Cook's distance (14.87) 
for each observation. Plot each of these in separate index plots and identify any influential 
observations. Summarize your findings. 


Refer to the CDI data set in Appendix С.2, Region is the nomina! leve! response variable 
coded 1 = NE, 2 = NC. 3 = S. and 4 = W. The pool of potentia! predictor variables includes 
population density (total population/land area), percent of population aged 18-34, percent of 
population aged 65 or older, serious crimes per capita (total serious crimes/tota! population), 
percent high school graduates, percent bachelor’s degrees, percent below poverty level, percent 
unemployment. and per capita income. The even-numbered cases are to be used in developing 
the polytomous logistic regression model. 

a. Fit polytomous regression model (14.99) using response variable region with 1=NE 
as the referent category. Which predictors appear to be most important? Interpret the 
results. 

b. Conduct a series of likelihood ratio tests to determine which predictors, if any, can be 
dropped from the nomina! logistic regression model. Contro! o at .01 for each test. State 
the alternatives, decision rules, and conclusions. 

c. For the full model in part (a). carry out separate binary logistic regressions for each of the 
three comparisons with the referent category, as described at the top of page 612. How do 
the slope coefficients compare to those obtained in part (a). 

d. For each of the separate binary logistic regressions carried out in part (c), obtain the 
deviance residuals and plot them against the estimated probabilities with a lowess smooth 
superimposed. What do the plots suggest about the adequacy of the fit of the binary logistic 
regression models? 

e. For each of the separate binary logistic regressions carried out in part (d), obtain the delta 
chi-square statistic (14.85). the delta deviance statistic ( 14.86), and Cook's distance (14.87) 
for each observation. Plot each of these in separate index plots and identify any influentia 
observations. Summarize your findings. 


Refer to the Prostate cancer data set in Appendix C.5. Gleason score (variable 9) is the 

ordinal level response variable, and the poo! of potential predictor variables includes PSA 

level, cancer volume, weight, age, benign prostatic hyperplasia, seminal vesicle invasion, and 

capsular penetration (variables 2 through 8). 

a. Fit the proportional odds mode! (14.105). Which predictors appear to be most important? 
Interpret the results. 

b. Conduct a series of Wald tests to determine which predictors. if any, can be dropped from 
the nomina! logistic regression model. Control o at 05 for each test. State the alternatives, 
decision rule, and conclusion. What is the approximate P-value of the test? 


14.60. 


14.61. 


Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 639 


. Starüng with the full model of part (a), use backward elimination to decide which predictor 


variables can be dropped from the ordinal regression model. Control the o risk at .05 at 
each stage. Which variables should be retained? 


. For the model in part (c), carry out separate binary logistic regressions for each of the two 


binary variables Y{” and Yf?, as described at the top of page 617. How do the estimated 
coefficients compare to those obtained in part (c)? 


. For each of the separate binary logistic regressions carried out in part (d), obtain the 


deviance residuals and plot them against the estimated probabilities with a lowess smooth 
superimposed. What do the plots suggest about the adequacy of the fit of the binary logistic 
regression models? 


. For each of the separate binary logistic regressions carried out in part (d), obtain the delta 


chi-square statistic (14.85), the delta deviance statistic (14.86), and Cook's distance (14.87) 
for each observation. Plot each of these in separate index plots and identify any influential 
observations. Summarize your findings. 


Refer to the Real estate sales data set in Appendix C.7. Quality of construction (variable 10) 
is the ordinal level response variable, and the pool of potential predictor variables includes 
sales price, finished square feet, number of bedrooms, number of bathrooms, air conditioning, 
garage size, pool, year built, lot size, and adjacent to highway (variables 2 through 9 and 12 
through 13). 


a. 


Fit the proportional odds model (14.105). Which predictors appear to be most important? 
Interpret the results. 


. Conduct a series of Wald tests to determine which predictors, if any, can be dropped from 


the nominal logistic regression model. Control œ at .01 for each test. State the alternatives, 
decision rules, and conclusions. Which predictors should be retained? 


. Starting with the full model of part (a), use backward elimination to decide which predictor 


variables can be dropped from the ordinal regression model. Control the о risk at .05 at 
each stage. Which variables should be retained? 


. For the model obtained in part (с), carry out separate binary logistic regressions for each 


of the two binary variables УХ and Y,”, as described at the top of page 617. How do the 
estimated coefficients compare to those obtained in part (a)? 


. For each of the separate binary logistic regressions carried out in part (d), obtain the 


deviance residuals and plot them against the estimated probabilities with a lowess smooth 
superimposed. What do the plots suggest about the adequacy of the fit of the binary logistic 
regression models? 


. Foreach of the separate binary logistic regressions carried out in part (d), obtain the delta 


chi-square statistic (14.85), the delta deviance statistic (14.86), and Cook's distance (14.87) 
for each observation. Plot each of these in separate index plots and identify any influential 
observations. Summarize your findings. 


Refer to the Ischemic heart disease data set in Appendix C.9. The response is the number 
of emergency room visits (variable 7) and the pool of potential predictor variables includes 
total cost, age, gender, number of interventions, number of drugs, number of complications, 
number of comorbidities, and duration (variables 2 through 6 and 8 through 10). 


a. 


Obtain the fitted the Poisson regression model (14.113) with the response function 
H(X, B) = exp(X’B). State the estimated regression coefficients, their estimated standard 
deviations, and the estimated response function. 

Obtain the deviance residuals (14.118) and plot them against the estimated model probabil- 
ities with a lowess smooth superimposed. What does the plot suggest about the adequacy 
of the fit of the Poisson regression model? 


640 PartThree Nonlinear Regression 


Case 
Studies 


14.62. 


14.63. 


14.64. 


c. Conduct a series of Wald tests to determine which predictors, if any, can be dropped f 
the nominal logistic regression model. Conirol o at .01 for each test. State the alternatives 
decision rules, and conclusions. w 


d. Assuming that the fitted mode! in part (a) is appropriate, use the likelihood ratio 
determine whether duration, coomplications, and comorbidities can be dropped fr 
model; control o at .05. State the full and reduced models, decision rule, and conc 


test ¢ 
om th 
fusion, 
. Use backward elimination to decide which predictor variables can be dropped from " 

regression model. Contro! the o risk at . 10 at each stage. Which variables are retained? = 


[c] 


E 


Refer to the IPO data set in Appendix C. 1 !. Carry out a complete analysis of this data set, hes 
the response of interest is venture capital funding, and the роо! of predictors includes fa’ 

value of the company, number of shares offered, and whether or not the company underwen’ 

a leveraged buyout. The analysis should consider transformations of predictors, inclusion g: 

second-order predictors, analysis of residuals and influential observations, model selection? 

goodness of fit evaluation, and the development of an ROC curve. Model validation should 

also be employed. Document the steps taken in your analysis. and assess the strengths and 

weaknesses of your fina! model. Е 

Refer to the Real estate sales data set in Appendix C.7. Create a new binary response Ж, 

able Y, called high quality construction, by letting Y = 1 if quality (variable 10) equals 1, arid 

Y = () otherwise (i.e., if quality equals 2 or 3). Carry out a complete logistic regression analy: 

sis. where the response of interest is high quality construction (Y). and the pool of predictors 

includes sales price, finished square feet, number of bedrooms, number of bathrooms, air 

conditioning, garage size. pool, year built, style, lot size, and adjacent to highway (variables2 

through 9 and 11 through 13). The analysis should consider transformations of predictors; 
inclusion of second-order predictors. analysis of residuals and influential observations, model 

selection. goodness of fit evaluation, and the development of an ROC curve. Develop a predic- 
tion rule for determining whether the quality of construction is predicted to be of high quality” 
or not. Mode! validation should also be employed. Document the steps taken in your analysis, 
and assess the strengths and weaknesses of your final model. ; 
Refer to the Prostate cancer data set in Appendix C.5. Create а new binary response vari: 
able Y. called high-grade cancer, by letting Y = ! if Gleason score (variable 9) equals 8, and: 
Y = O otherwise (i.e.. if Gleason score equals б or 7). Carry out a complete logistic regression 
analysis, where the response of interest is high-grade cancer (Y). and the pool of predic 
tors includes PSA level. cancer volume, weight, age, benign prostatic hyperplasia, seminal 
vesicle invasion, and capsular penetration (variables 2 through 8). The analysis should conz 
sider transformations of predictors, inclusion of second-order predictors. analysis of residuals, 
and influential observations. model selection, goodness of fit evaluation, and the development 
of an ROC curve. Develop a prediction rule for determining whether the grade of disease 18: 
predicted to be high grade or not. Model validation should also be employed. Document the: 
steps taken in your analysis. and assess the strengths and weaknesses of your final model. 


тыз : 


gle- Factor 


Part 


Chapter ; 


642 


Introduction to the Design 
of Experimental and 
Observational Studies 


In Parts I-III, we focused on the use of linear and nonlinear statistical models for the: 
analysis of experimental and observational data, There, an observed response vector У and: 
associated design matrix X were used to model the relationship between response and the. 
predictors and to develop appropriate statistical inferences. We will now emphasize йе. 
statistical design of scientific studies. 

Our basic goal will be to design studies in such a way that they lead to a simple, effective : 
statistical analysis. Since nearly all scientific studies are analyzed using linear statistical 
models, the ability to design studies properly depends critically on an understanding of 
the materials covered in Parts I-III. For example, in Section 4.7, we discussed the range, 
spacing, and number of X levels when the objective of the study was to estimate a simple 
linear relation between a response Y and a single predictor X. We observed there that the 
range and spacing of the Xs have a direct effect on the precision with which we estimate 
key parameters, such as the slope. We showed that the variance of the estimated slope is 
minimized when the Xs are split evenly at minimum and maximum levels for the scope of 
the experiment. Minimization of this variance leads to a more precise parameter estimate 
and improved statistical power. 

In this chapter and those that follow, we consider the design of scientific studies and* 
the specialized linear models—called analysis of variance (ANOVA) models—employed 
in their analysis. We emphasize that the proper design of a scientific study is far more 
important than the specific techniques used in the analysis. As we shall see, a well-designed 
study is usually simple to analyze. On the other hand, a poorly designed study or a botched 
experiment often cannot be salvaged, even with the most sophisticated analysis. 

We begin in the current chapter with an overview of the design of scientific studies. 
Generally, a scientific study can be categorized as either an experimental study or an obse 
vational study. The distinction is important because experimental studies provide a much 
firmer basis for the establishment of cause-and-effect relationships between one or more 
explanatory factors and a response variable than do observational studies. With the latter, 
one can establish association between the explanatory factors and the response variable, 


Е Chapter 15 Introduction to the Design of Experimental and Observational Studies 643 


but not causation. We continue with an overview of the basic concepts and planning ap- 
proaches used in the design of experimental and observational studies. Finally, we present 
a case study to illustrate both the design and analysis of an experimental study based on a 
matched pairs design. 


Experimental Studies, Observational Studies, and Causation 


For many persons, the first exposure to the concept of an experiment was in a high school 
or elementary school science class. For example, a high school science teacher might 
demonstrate the influence of atmospheric pressure on boiling temperature by showing that 
water will boil at room temperature in а near vacuum. We note that this example was 
not an experiment, but was simply a demonstration. Designed experiments are conducted 
to demonstrate a cause-and-effect relation between one or more explanatory factors (or 
predictors) and a response variable. The demonstration of a cause-and-effect relationship is 
accomplished, in simple terms, by altering the levels of the explanatory factors (i.e., the Xs) 
and observing the effect of the changes on the response variable Y . Furthermore, designed 
experiments are frequently comparative in nature. 

For example, a famous experiment on the effects of vitamin C on the prevention of colds 
in 868 children was conducted in 1976. Of the 868 children studied, half were randomly 
selected for the experimental group. Children in this group received a 1,000-mg tablet of 
vitamin C daily for the test period. The remaining children, who made up the control group, 
received a placebo—an identical tablet containing no vitamin C—also on a daily basis. The 
results showed that the average number of colds per child was .38 for children receiving 
vitamin C, while the average for children receiving the placebo was .37. The difference 
between the two groups (.01 colds per child) was not statistically significant. 

The explanatory factor in the vitamin C example is a qualitative predictor X having two 
levels: X = 1 if child received vitamin C; X = 0 if child did not receive vitamin C. The 


to as treatments. Just as there are two levels of the explanatory factor in the vitamin C 


experiment, there аге two treatments: vitamin С and placebo. The objects or entities to 
which treatments are applied are generally referred to as experimental units. Here the 


experimental units are the children who received either of the two treatments. 

Assignment of the treatments (factor levels) to the experimental units was performed us- 
ing a process called randomization. We shall discuss randomization in detail in Section 15.2, 
but we note for now that the purpose of randomization here was to balance the character- 
istics of the children in each of the treatment groups, so that differences in the response 
variable can be attributed to treatment differences, and not to differences between the two 
groups of children. For example, one could imagine a poorly designed version of this study, 
in which the 868 children attended two elementary schools. For convenience, the investi- 
gator might use children from one school as the experimental group, and children from the 
second school as the control group. In such a plan, it would be impossible to distinguish the 
effects of being in a particular school —which could be severe if a particularly contagious 
cold virus broke out in one of the schools—from the presence or absence of vitamin C. In 
contrast, with randomization, we are guaranteed that about half of the children from each 


644 PartFour Design and Analysis of Single-Facior Studies 


school would receive the vitamin C regimen. Therefore any differences in the incidence # 
colds in the two groups will likely not be attributable to or confounded with the School E 
Thus E 


of randomization: If important differences in the responses result between the treatment 
groups, we can attribute them to the treatments. We give the following definition 


| of a 
comparative experimental study. 
dna een enemcltudivarnclomizonicralisreuplogedtemssign — 
among the treatment groups are compared to assess treatment effects, 
The treatments are defined by the levels of one or more explanatory fac- (15.1). 
tors, referred to as experimental factors. Cause-and-effect relationships 
: 


We now present another example of an experimental study. 


Example 1 - Experimental Study of Quick Bread Volume. A simple comparative experiment was 

— —— —— conducted to study the effect of baking temperature on the volume of a quick bread prepared 
from a package mix. Four oven temperatures—low. medium, high, and very high—were 
tested by randomly assigning each of the four levels of temperature to five package mixes. 
This is an experimental study because the levels of the explanatory factor (baking temper- 
ature) are randomly assigned to the experimental units. The experimental units here are 
the 20 packages of mix. The experimental design used is called a completely randomized 
design, with each of the 20 packages of mix having an equal chance to be assigned to each 
of the four cooking temperatures. Note that the design used in the vitamin C example was 
also a completely randomized design. 


Comment 


The, vitamin C experimental study is an example of a clinical trial. A clinical trial is defined as a 
: : : Е | | етн " 


Observational Studies 


teatments-to-experimenral-units-does-not-occur. For example, a study of the effects of 
education and type of work experience of sales people on their sales volumes was made by 
selecting a random sample of sales people currently employed by a company and obtaining 
information on highest degree obtained, type of experience, and sales volume for each of 
the selected employees. This is an observational study because it is not possible to randomly 
assign the levels of the predictor variables of interest (education and type of experience) to 
the employees. 

We focus here on “comparative” observational studies, where two or more groups (poP- 
ulations, subpopulations, processes, etc.) are compared. The sales example just mentioned 
is a comparative observational study, because sales volumes for different groups of sales 
people were to be compared for different levels of education, experience. and so forth. This 


‘FIGURE 15.1 
‘Teaching 
"Performance 
-Comparison— 
‘Teaching 
‘Effectiveness 
‘Example. 


Chapter 15 Introduction to the Design of Experimental and Observational Studies 645 


is in contrast to simple descriptive studies that do not involve statistical comparisons of 
groups. We give the following definition of a comparative observational study: 


In a comparative observational study, random samples are obtained from 

two or more populations (or subpopulations) and the observed outcomes 

are compared across populations (or subpopulations). The populations or 
subpopulations are defined by the levels of one or more explanatory fac- 

tors, referred to as observational factors. A cause-and-effect relationship (15.2) 
between the explanatory factors and the outcome or response variable is 

difficult to establish in an observational study. Usually, evidence external 

to the observational study would be required to rule out possible alternative 
explanations for cause and effect. 


At times, investigators Use nonrandom convenience or quota samples. These samples are 
sometimes referred to as pseudo-random samples or representative samples and treated as 
if they were truly random. It must be cautioned here that random selection or assignment 
greatly enhances the generalizability of the study results and avoids potential biases that 
otherwise may occur when nonrandom selection is used. 

The following is an example of an observational study. 


Observational Study of Teaching Effectiveness. Recently, the administration of a col- 
lege of business offered its faculty the opportunity to participate in a summer workshop on 
case teaching methods. Faculty were not required to attend the workshop, but were asked 
to sign up on a first-come, first-served basis. Of the 110 faculty in the business school, 63 
faculty elected to attend the seminar. 

At the end of the following academic year, the administration compared the recent 
teaching performances of faculty who attended the seminar to those who did not attend. 
Students evaluated faculty on a 7-point scale, where 1 indicates poor performance and 7 is 
outstanding. Average teaching ratings for all faculty members during the year following the 
seminar were obtained. The aligned dot plots in Figure 15.1 compare the performances of 
faculty who attended the seminar with faculty who chose not to attend. These plots suggest 
that faculty who attended the seminar were generally rated more highly by students than 
faculty who did not attend, and this is confirmed by the sample averages. The average rating 
for faculty who attended was 5.76; the average for those who did not attend was 5.26. On 
the basis of two-sample t-test (A.67), administrators concluded that the observed difference 
(.50) was statistically significant. The P-value of the test was 0+. 


Attendance P HA : : : : “$$ ә» $ 
Not Attend —1—— —+— ———r == 
Attended —+—— —]1— ———r = 
3.5 4.5 5.5 6.5 " 


646 Part Four 


Design and Analysis of Single-Factor Studies 


It is tempting to conclude, on the basis of this analysis, that the seminar was effec 
improving the quality of teaching. However, this is clearly an observational Study, becanee 
a random assignment of the treatments (attend workshop, do not attend workshop) tg = 
perimental units (instructors) did not occur. Thus cause-and-effect between the explanatn, 
factor (workshop attendance) and the response (teaching effectiveness) cannot be Pd 
inferred. It is possible that the workshop improved teaching quality, but a number diam. 
native explanations for the observed difference are also plausible. For example, it may һауе" 
been that better, or more highly motivated, teachers volunteered for the workshop, In this: 
case, the workshop attendees would be rated more highly on average even if the workshop: 


tive iy 


had no beneficial effect. 

This investigation would have been an experimental study if the administration had cli. 
sen a subset of the faculty at random for participation in the workshop. If the results led 
to a difference in teaching quality, such as that shown in Figure 15.1, the administration. 
would be justified in concluding that the seminar had a beneficial effect on teaching ef. 
fectiveness. The reason that a cause-and-effect conclusion would be justified here ig that 
the randomization would tend to balance out the differences in other factors, such ag pre- 
workshop teaching ability or motivation, Jeaving the observed differences attributable to 
the experimental treatment. 


Comment 


Ordinal level data are frequently assumed to approximate equally spaced interval data and as such 
are appropriately analyzed using statistical techniques designed for continuous, equal interval level 
measurements. We have done so with the teaching effectiveness scores but caution the reader that at 
times this assumption may not be supported, in which case specialized techniques for the analysis of 
ordinal level data. such as those discussed in Chapter 14, should be employed. |] 


Mixed Experimental and Observational Studies 


Example 3 


A third type of study, which involves aspects of both experimental and observational studies 
is also possible. We illustrate this third case with an example. 


Mixed Experimental and Observational Study of Mechanics’ Training. An appliance 
manufacturer operates three regional training centers in the United States for training me- 
chanics to service the company’s products. At each regional center, two different training 
programs were studied, with the trainees from the region assigned at random to one of the 
two training programs. One may view this as a two-factor study, the factors being training 
program (experimental factor) and training center (observational factor). If the same tram- 
ing program is superior to the other in all three centers, the evidence is quite clear as to 
the comparative effects of the training programs since at each center the trainees from the 
region were assigned at random to the two programs. 

Note that the training center was not randomly assigned to subjects; each trainee was 
assigned to the center for the region in which the trainee is located. Therefore a cause-an¢- 
effect relationship between training centers and quality of training cannot be demonstrated 
rigorously. One center may excel for any number of reasons, such as because its staff is 
doing a better training job, because it has better facilities, or because trainees assigned (0 
it come from a geographic region in which better education is provided. Evidence external 
to this study would be required as to whether or not the education of trainees at the three 


Chapter 15 Introduction to the Design of Experimental and Observational Studies 647 


centers is the same, whether or not the facilities are equal, and the like, before a clear 
understanding of the reasons for differences between training centers could be obtained. 

1 As we will see, this is an example of a blocked experimental study, where the blocks 
refer to the training centers (observational factor) and the training program is the treatment 
(experimental factor). 


2 _ Experimental Studies: Basic Concepts 


& 


The design of an experiment refers to the structure of the experiment, with particular 
= reference to: 


à •. The set of explanatóry factors included in the study. 
` * The set of treatments included in the study. 
* The set of experimental units included in the study. 
* * "The rules and procedures by which the treatments are randomly assigned to the experi- 
M mental units (or vice versa). 
m * The outcome measurements that are made on the experimental units. 


In this section we discuss each of these topics in turn. 


A factor is an explanatory variable to be studied in an investigation. For instance, in an 
investigation of the effect of price on sales of a luxury item, the factor being studied is price. 
Similarly, in a study comparing thé appeal of four different television programs, the factor 
under investigation is television program. In the quick bread volume example, the factor 
under investigation is baking temperature. In a regression context, factors are typically 
referred to as predictors or independent variables. 

A factor may be categorized as to whether it is an experimental factor or an observational 
factor. An experimental factor is one where the level of the factor is assigned at random to 
the experimental unit. An illustration is the factor baking temperature in the bread volume 
example. In any investigation based on observational data, the factors under study are 
observational factors. Ап observational factor pertains to the characteristic of the units 
under study and is not under the control of the investigator. Observational factors can 
be found in experimental studies, and therefore it is important to recognize them as such, 
since cause-and-effect inferences cannot be made for these factors. As we noted earlier in 
the mechanics' training example, the training program was an experimental factor, while 
the training center was an observational factor. 

Just as in regression, where both qualitative and quantitative predictors can be employed, 
experimental factors can be either quantitative or qualitative. A qualitative factor is one 
where the levels differ by some qualitative attribute. Examples are type of advertisement, 
brand of rust. inhibitor, or television program. In Chapter 8 we described the use of r — 1 
indicator variables to model a qualitative predictor having r levels. A quantitative factor 
is one where each level is described by a numerical quantity on an equal-interval scale. 
Examples are temperature in degrees Celsius, age in years, or price in dollars. 

A factor levelis a particular form of that factor. In the bread volume example, four baking 
temperatures were used, namely, 320°F (low), 340°F (medium), 360°F (high), and 380°F , 


648 Part Four 


Design and Analvsis of Single-Factor Studies A 


(very high). Each of these temperatures is a level of the factor under study, and we К 
the temperature factor has four levels in this study. As another example, in a study of ire. 
effect of color of the paper used in a mail questionnaire on response rate, color of 


з ; Paper i 
the factor under study, and each different color used is a level of that factor, Perig 


Crossed and Nested Factors 


FIGURE 15.2 
Crossed 
Factors and 
Nested 
Factors— 
Chemical Yield 
and Production 
Yield 
Experiments. 


Investigations differ as to the number of factors studied. Some are single-factor studies’ 
where only one factor is of concern. For instance, the study of the effect of four different 
baking temperatures on quick bread volume mentioned earlier is an example of a single. 
factor study. In multifactor studies, two or more factors are investigated simultaneously, An 
example of a multifactor investigation is a study of the effects of three levels of temperature 
and two levels of concentration of solvent on the yield of a chemical process. Here, two 
factors—temperature and concentration—are studied simultaneously to obtain information 
about their effects on the yield. The three levels of temperature and two levels of Solvent 
concentration lead to 3 x 2 = 6 factor-level combinations: 


Factor Solvent 
Combination Temperature Concentration 
1 Low Low 
2 Low High 
3 Medium Low 
4 Medium High 
5 High Low 
6 High High 


These factor combinations can be represented by the two-way table in Figure 15.2a. We say 
that the two factors are crossed when all combinations of the levels of the two factors are 
included in the study. The sales volume study is another example of a study in which the 
factors, education and type of experience, are crossed. 


(a) Crossed Factors— Chemical Yield Experiment 


Temperature 
Solvent Conc. 
Low Medium High 
Low X X X 
* 
High X X X 


(b) Nested Factors— Production Yield Experíment 


Operator |] 
Plant 
2 3 4 5 6 7 8 9 
1 X X X 
2 X X X 


è Treatments 


Chapter 15 Introduction to the Design of Experimental and Observational Studies 649 


In some studies the levels of one or more of the factors are unique to a particular level of 
another factor. For instance, in a study of the effects of operators on production yield in three 
manufacturing plants, three operators were selected in each of the three plants, and their 
production yields were recorded for five batches of product. A diagram of this experiment 
is given in Figure 15.2b. Note that the first three operators are employed only in plant 1, the 
next three are employed only in plant 2, and the last three are employed uniquely in plant 3. 
Here, operators are said to be nested within manufacturing plants. 


The set of treatments to be included is determined by the set of factors and the levels of each 
factor. In single-factor studies, a treatment corresponds to a factor level. Thus, in a study of 
five advertisements, each advertisement is a treatment. In multifactor studies, a treatment 
corresponds to a combination of factor levels. For instance, in a study of the effects on sales 
volume of price ($.25, $.29) and package color (red, blue), each price-color combination, 
such as $.25 price—Tred package color, is a treatment. When a treatment is indicated by a 
combination of two or more factor levels, the combination of levels is sometimes referred 
to as a treatment combination. 'This particular study contains four treatments or treatment 
combinations since there are four price-color combinations. 

The definition of a treatment can at times be a difficult problem. Consider an experiment 
to study whether C or JAVA is a better programming language to teach in an introductory 
computing course. Some teachers will prefer C, others JAVA. Should the treatments then 
be defined as the programming language taught by instructors who prefer that language? If 
so, differences in findings may be due to differences between the two groups of instructors. 
Should the definition of a treatment not include the instructor, and instructors be random- 
ized, with some being forced to teach a language they do not prefer? Or should instructor 
preference be a second factor, with each instructor teaching both languages? Problems of 
this kind need careful resolution so that the results of the study will be useful. 


Choice of Treatments 


Generally, the investigator must decide upon the number of factors to be included, the 
number of levels of each factor, the range of levels within each factor (for quantitative 
factors), and the need for a control treatment. We shall discuss each of these aspects in turn. 


Number of Factors. In the initial stages of an investigation or when little theory is 
available, there is frequently a desire to include many more factors than can possibly be 
studied in a single experiment. For example, the quick bread volume experiment discussed 
above was adapted from a much larger optimization study of quick bread production. When 
the study was initiated, process engineers and food scientists conducted a brainstorming 
session to identify factors that could potentially affect quick bread volume. Cause-and- 
effect diagrams (also known as Ishakawa or fish-bone diagrams), such as that shown in 
Figure 15.3, are often used to guide such sessions and to summarize results. 'This particular 
session identified over 15 potential causal factors—far too many to include in the experiment. 
From this number, four factors—oven temperature, proof time, yeast type, and flour protein 
level—were included, each at two levels. This led to 2* = 16 treatment combinations. 


Number of Levels of Each Factor. For qualitative factors, the number of levels may be 
dictated by the nature of the factor. For example, in the incentive system example discussed 


650 PartFour Design and Analysis of Single-Factor Studies 


FIGURE 15.3 
Cause-and- 
Effect 
Diagram-— 
Quick Bread 
Optimization 
Example. 


arc 


Flour 
* Protein 
* Ash 


Training ——» 
* Testing 
* Diagnostics 


Yeast Type 


Operators Dough Additive 


Oven 
* Temp. Dispersion 
* Humidity 


Yeast Level ———y 
Temperature 
Proof Time 
Mixer 


earlier, three alternative incentive systems were under consideration. One involved increases 
to hourly wages, another involved the use of bonuses and financial awards, and another 
involved recognition and the awarding of additional vacation time. Thus the company felt 
that all three levels of the incentive system factor should be included in the experiment, 
In other instances, it might be necessary to drop one or more of the levels of a qualitative 
factor in order to reduce the cost of the experiment. For example, in an experiment to 
invesugate the effect of color of paper (blue, green, orange, and yellow) on the response 
rates for questionnaires, it might be concluded that a least-promising color should simply 
be eliminated in order to reduce the cost or complexity of the experiment. 

For quantitative factors, the number of levels chosen should reflect the type of trend 
expected by the experimenter. If the experimenter believes that the change in the response 
will be roughly linear in the range chosen for the factor, two levels—the minimum and 
the maximum of the specified range—may be sufficient. Three levels are useful if the 
experimenter believes that the response will follow a quadratic trend in the chosen range, or 
if a linear trend is expected, but a test for lack of fit is desired. Use of four or more levels is 
justified if a highly detailed examination of the shape of the response curve is desirgd, orif 
the response curve is increasing or decreasing to an asymptotic value. Often, three equally 
spaced levels are sufficient. 


Oxidant Level 


Oven Temp. 


aM 


Range of Levels for Quantitative Factors. Choosing the range of a quantitative factor 
to be explored is one of the most important design decisions. If the range is too small, the 
effect of a change from the smallest level to the largest level of the factor may be too small 
to detect. If the range is too large, important changes in the mean response may be missed. 
For example, suppose that the true regression function in the quick bread volume example is 
given by the curve in Figure 15.4. The response increases in roughly linear fashion for baking 
temperatures between 300 F and 400°F, and levels off for baking temperatures outside this 
range. If the range is too small and we are in an area where the change in the mean response 
is small or moderate, for example 250*F-300*F, we will conclude that temperature has 


Chapter 15 Introduction to the Design of Experimental and Observational Studies 651 


Volume 


250 300 350 400 450 
Temperature 


little effect on volume. If the range is too large, for example 250°F—450°F, and only these 
two levels are used as treatments, important features of the curve—such as the maximum 
(near 400°F) may be missed. We see that an effective choice of range for a quantitative 
factor frequently requires a good prior knowledge of the nature of the relationship between 
the mean response and the factor(s) under study. 


Control Treatment. A control treatment is needed in some experiments, but not in all. A 
control treatment consists of applying the identical procedures to experimental units that are 
used with the other treatments, except that none of the treatments are applied. In a study of 
food additives, for instance, a treatment may consist of a portion of a vegetable containing 
a particular additive that is served to a consumer in a particular experimental setting in the 
laboratory. A control treatment here would consist of a portion of the same vegetable served 
toaconsumerin the identical experimental setting except that no food additive has been used. 

A control treatment is requited when the general effectiveness of the treatments under 
study is not known, or when the general effectiveness of the treatments is known but is not 
consistent under all conditions. In the food additives example, suppose it is known that food 
additive A is highly effective in enhancing the tastiness of vegetables and it is desired to see 
if additives B and C are equally effective or possibly even more effective. In that case; a 
standard of comparison is available and no control treatment is required. On the other hand, 
suppose there is no knowledge about the general effectiveness of the three additives, and 
the following results are obtained (ratings can range between 0 and 60): 


Additive Mean Rating 


A 39 
B 37 
C 41 


Assume that the sample sizes are large so that the mean ratings are very precise. In the 
absence of a standard of comparison, one would not know here whether each of the three 
additives is effective or whether none of the additives is effective. 

It is crucial that the control treatment be conducted in the identical experimental setting 
as the other treatments. In the food additives example, for instance,-a survey of consumers 
at home, in which persons are asked to rate the general tastiness of the vegetable (without 
any additive) on the same scale as in the experiment, would not qualify as a control treat- 
ment. Such a survey might yield a mean rating of 22, suggesting that the three additives 
substantially increase the tastiness of the vegetable. This conclusion, however, could be 
grossly misleading. If the control treatment actually were incorporated into the experiment 


652 PartFour Design and Analysis of Single-Factor Studies 


so that consumers are given portions of the vegetable with no additive in the laborato | 
setting, the mean rating for the control treatment might be 40. This result would imply ttr 
none of the three additives is effective in enhancing the tastiness of the vegetable, The reg. 
son for the higher mean rating in the laboratory setting could be a “halo” effect conne 
with the experimental procedures. Possibly, foods served in the experimental setting taste. 
better than at home, or perhaps consumers try to oblige by giving higher ratings when they 
participate in an experimental study. Thus, only a control treatment incorporated into ae 
experiment can serve as the proper standard of comparison. : 


Experimental Units 


As we noted earlier, the experimental units are the objects or entities to which the treat 
ments are applied in an experimental study. There are times when confusion may arise ag 
to the precise nature of the experimental unit. The following definition makes Clear that 
experimental units are determined by the method of randomization employed. 


An experimental unitis the smallest unit of experimental material to which 
a treatment can be assigned; the experimental unit is thus determined by (15.3) 
the method of randomization. 


For example, consider again the experimental study of two incentive pay systems. We 
asked above if the basic study unit should be an individual employee, a shift, or a plant. 
As noted, it may be impossible to assign different incentive pay systems to individual 
employees or to individual shifts, but a random assignment of different incentive systems 
to different plants would be feasible. Here, the smallest unit of experimental material to 
which a treatment (incentive system) can be assigned is the plant, and so it follows that the 
plant is the experimental unit. 

Representativeness of the experimental units is another important consideration in the 
design of experimental studies. Consider a study of management behavior with different 
communications networks. A university investigator may be tempted to use students as 
subjects because of their ready availability. ЇЇ, however, information is desired about the 
behavior of business people, the students may not be representative experimental units. 
It hardly needs to be stated that an investigator should make every effort to obtain repre- 
sentative experimental units. Conversely, one should be cautious in extending results of 
an investigation to groups for which the study units are not representative. Thus, if the 
communications network study cited above did use students, one should not automatically 
assume that the findings are relevant to business people. 

A different aspect of defining the basic unit of study occurs in investigations of sales 
and similar phenomena. Suppose that we wish to measure the effectiveness of five different 
television commercials in terms of sales during a period of time subsequent to their showing. 
Should the length of time be one week, two weeks, one month, or some other time period? 
Clearly, the purposes of the study will need to govern the length of time that makes Up the 
basic study unit here. 


Sample Size and Replication 
Sample size is usually determined by statistical considerations, by resource or budget cor 
siderations, or both. Generally, the larger the sample size, the greater will be our ability © 
detect any differences in responses due to the treatments. Thus a key step in any experimen 


Chapter 15 Introduction to the Design of Experimental and Observational Studies 653 


ў design is to assess the power of the statistical tests to be used in the analysis, or ће precision 
š of the estimates to be produced by the analysis, as a function of sample size. Ultimately, a 
E trade-off must be made between the increase in power and precision resulting from higher 
sample sizes, and the added cost or time required to field the experiment. Statistical pro- 
cedures used for determining power and precision depend on the particular experimental 
. design used. We shall discuss these methods throughout the remainder of the text as new 
>i experimental designs are introduced. 

| We note that in many designed experiments, the sample size is an integer multiple of 
the number of treatments. For example, in the bread volume experiment, there were eight 
experimental units (packages of bread mix) and four treatments. Thus each treatment was re- 
peated twice. We say that there were two complete replicates of the experiment. Frequently, 
the total sample size is simply determined by the number of complete replicates chosen in 
the experimental design. ‘Replication makes it possible to estimate the experimental error 
variance, which is required for testing the presence of treatment effects or for establishing 
confidence interval estimates of these effects. When a treatment is repeated, any difference 
in the response from prior responses for the same treatment (under similar experimental 
conditions) is due to experimental error, and it therefore provides one additional piece of 
information (i.e., one degree of freedom) about the pure error variance. If this experimental 
error variance is small, the response is sometimes said to be highly reproducible. If the error 
variance is high, the response has low reproducibility. 


Randomization 

Randomization in experiments is a relatively recent idea, first introduced by the famous 
British statistician Sir R. A. Fisher during the early part of the twentieth century. In the 
past, treatments had been assigned to experimental units either on a systematic or on a 
subjective basis. We noted in the teaching effectiveness example how biases can arise when 
self-selection is employed to assign experimental units to the treatments. The same dangers 
exist with systematic and subjective selection. For instance, consider an experiment using 
10 employees and two treatments, where the first five employees on the payroll listing are 
assigned treatment 1 and the next five treatment 2. Suppose that the payroll listing is by 
seniority, and that experience is related to productivity, the phenomenon under study. A 
comparison of treatments 1 and 2 then will reflect not only differences between the two 
treatments but also differences in the amount of experience between the two groups of 
employees. This potential bias may be so transparent that no good experimenter would use 
the type of systematic assignment just described. Nevertheless, there may be many other 
sources of bias in systematic selection that are not so apparent. 

Subjective assignments of treatments to experimental units can also lead to selection bias, 
as when an experimenter subconsciously tends to assign one treatment to highly extrovert 
subjects and the other treatment to less extrovert subjects. 

With randomization, the treatments are assigned to experimental units at random. Ran- 
domization tends to average out between the treatments whatever systematic effects may 
be present, apparent or hidden, so that comparisons between treatments measure only the 
pure treatment effects. Thus, randomization tends to eliminate the influence of extraneous 
factors not under the direct control of the experimenter and thereby precludes the pres- 
ence of selection bias. Cochran and Cox (Ref. 15.1, p. 8) have likened randomization to an 
insurance policy in that it is a precaution against biases that may or may not occur. 


654 Part Four 


Design and Analysis of Single-Factor Studies 


Randomization is appropriate not only for the assignment of treatments to experimentar 
units but also for any other phases of the experiment where systematic effects not under the 
control of the experimenter may be present. For instance, consider an experiment in Which 
five treatments (alternative methods of measuring subjective probability) are studied and 
20 subjects are used. Only one subject can be run per day; thus, four weeks are required to 
complete the experiment. In this type of situation, it usually is highly desirable to determine 
the order of the ueatments randomly since a variety of systematic time effects could be 
present. The experimenter may with time improve the explanation of the methods of mea. 
suring subjective probability, there may be a streak of extremely hot weather during a week, 
and the like. With these possible time effects, a systematic assignment of one treatment per 
week could lead to seriously biased results. Randomization, on the other hand, will tend to 
average out whatever systematic effects are present, whether anticipated or not. 


How to Randomize. Randomization requires that a series of experimental units (or 
treatments) be placed in a random order. To illustrate this in simple fashion, we consider 
again the quick bread volume example with two replicates. Here four treatments (T, —]low, 
Т— тейит, 73—high, 7;—very high) are considered and 2 x 4 = 8 package mixes, 
labeled | through 8, are to be used as experimental units. The situation is: 


Treatments Т Т Т», T4 
Sample Sizes 2 2 2 2 


and the eight treatments to be assigned to package mixes are listed as (the order is arbitrary): 
Т, Т, T? Т, Т» T; T4 Т, 


Torandomly assign the treatments to the experimental units, we obtain a random ordering 
of these treatments. To do so we generate eight random numbers from any continuous 
probability distribution (or obtain eight random numbers from a table of random digits) and 
associate each number obtained in sequence with the above list of treatments. The eight 
random numbers below were obtained from a standard normal random number generator: 


h h h T; Т T Ta Ta 


a 


—0.37 0.01 1.40 —1.65 0.16 —0.25 —1.10 0.77 


We now rearrange the pairs above in ascending sequence for the random numbers and 

: + 1 ~ М . » «о» 
associate them with the package mixes, which we have arbitrarily labeled “1” through 8. 
Thus we obtain the following randomized assignment of treatments to experimenta) units: 


Treatment: Tə T4 h Т h Ts T, h 
Random number: 1.65 1.10 0.37 0.25 0.01 0.16 0.77 1.40 


Package mix: 1 2 3 4 5 6 7 8 


E Chapter 15 Introduction to the Design of Experimental and Observational Studies 655 


As a result of the randomization, treatment T, (low temperature) is to be assigned to package 
‚ mixes З and 5; treatment T} (medium temperature) is to be assigned to package mixes 1 
| and 8; and so on. The experimental trials should be conducted in a random order. 
ES Some statistical packages provide facilities for randomly permuting the treatments (or 
E experimental units) directly, which can simplify the process considerably. 


Comments 

1. Randomization also can provide the basis for making inferences without requiring assumptions 
about the distribution of the error terms. We shall discuss this use of randomization in Section 16.9. 

2. The implications of randomization may be viewed in a somewhat different fashion than that 
presented so far. The.random errors of experimental units that are adjacent іп time or space are often 
correlated, not independent, as a result of various systematic effects over time orspace. Randomization 
does not eliminate this correlation pattern but, by making it equally likely that any two treatments are 
adjacent, tends to eliminate the correlations between treatments with increasing replications. Thus, 
randomization makes it reasonable to analyze the data as though the model random error terms are 
independent, an assumption that has been made in almost all models discussed so far. 

3. Occasionally, randomization may provide a pattern that makes the experimenter uneasy. For 
instance, randomization of the time sequence in which four experimental units were assigned to 
treatment | and four assigned to treatment 2 may result in a randomized sequence where the four 
experimental units for treatment 1 are exposed first and then the four experimental units for treatment 
2 are exposed. This is not a likely occurrence, but one that can take place. Some solutions have been 
suggested for this problem, but none provides a final answer. In practice, the experimenter typically 
will discard a randomization sequence that has apparent dangers of systematic effects for the particular 
experiment and select another randomization. ш 


Constrained Randomization: Blocking 
Blocking is a technique that can be used to increase precision in any experiment. To provide 
some context and to motivate the concept, we shall again consider the vitamin C experiment 
discussed earlier. 

Recall that half of the children in the vitamin C example were randomly assigned to 
the control group, and half were assigned to the experimental group. At the end of the test 
period, the number of colds Y contracted by each child was recorded. A linear statistical 
model for the ith child’s response is: 


Y; = Bo + Bi Xin + & (15.4) 


where: 


x= 1 ifithchild receives vitamin C 
' 7. 10 if ith child receives placebo 


With X; defined in this fashion, £o is the population mean response for children in the control 
group (1.е., those receiving the placebo), and Во + £f, is the population mean response for 
children in the experimental group (i.e., those receiving vitamin C). The treatment effect 
parameter, £j, represents the increase or decrease in the average number of colds per child 
dueto the vitamin C regimen. Finally, the experimental error e; is the deviation of the number 
of colds for the ith child from the true mean of the child's treatment group—sometimes 
called the specific effect associated with the ith experimental unit. The variance of the 
experimental error is o? = o?[e;]. 


656 PartFour Design and Analysis of Single-Factor Studies 


We shall assume that the goal of the study is precise estimation of (or inferenc, 


Е è ie Е е 

the treatment effect, 6,. Then a key quantity of interest 15 the variance of the least ы 

estimator, b, of this effect. From (2.3b), we have: quar | 

о? 3 

Шы су (15.53) 

lt is easy to show, when the number of children in the two treatment groups are the some 

that the variance of b, is: $; 
40? 

o? (b) = — (15.56) 


Thus for a given sample size (here n = 868), increased precision can only come about: 
through reductions in the experimental error variance, 0°. 

One way to reduce с? (e;] is to identify and control factors that contribute to variation in 
the ¢;. In the vitamin C example, some factors (other than vitamin C) that might affect the 
numbers of colds contracted by the ith child might include: the gender of the child, the age of 
the child, the general health status of the child, the nutritional habits of the child, and so on, 
These factors, which affect the response but are not of primary interest to the investigator, 
are referred to as nuisance or confounding factors. For simplicity, we will assume that there 
is just one nuisance factor in the experiment other than the treatment effect, namely, gender, 
This source of variation could be removed from experimental error by using only males or 
only females. 

For example, if only females are used as subjects, the model for our response is now: 


Y; = Po +В. Xn +e (15.6) 


where є/ is the experimental error when subjects are exclusively female. If females tend 
to have fewer (or more) colds than males, then the female experimental units are more 
homogeneous and the experimental error variance will be reduced. 

Of course there are disadvantages to limiting the experiment to one gender. First the 
sample size n is reduced, which increases the variance of our estimated treatment effect in 
(15.4b), and second, we would not be able to generalize the results of the experiment to 
the gender that was omitted. These disadvantages are overcome by a technique known as 
blocking. 

In a blocked experiment, the heterogenous experimental units are divided i into homoge- 
neous subgroups called blocks, and separate experiments are conducted in each block. For 
example, blocking on gender in the vitamin C example would be accomplished by con- 
ducting separate experiments on males and females. Because gender does not vary within 
blocks, the effect of vitamin C is more efficiently estimated within each block. The overall 
effect of the experimental factor is obtained by combining the estimated effects from each 
of the blocks. 

Note that because blocking requires that separate experiments be conducted in each 
block, it follows that separate randomizations of treatments to experimental units (OF vice 
versa) must be carried out within each block. The within-block randomization is sometimes 
referred to as a restricted randomization because assignments of treatments can only be 
made to experimental units within the given block. 


pu 


815. 


= 


ч 


1 freatment: — Vitamin С VitaminC Placebo — Vitamin C ... Vitamin C Placebo 
Female: 1 2 3 4 - 433 434 


Җ 


КА 


Treatment: 


Male 1 2 3 4 eee 433 434 


Chapter 15 Introduction to the Design of Experimental and Observational Studies 657 


5 Randomized Complete Block Design—Vitamin C Example. 


Restricted Randomizatioris 


Placebo Vitamin C — Placebo Vitamin C Placebo Vitamin C 


An example of a blocked layout for the vitamin C example is given in Figure 15.5. Notice 
that each block consists of 434 subjects (assuming half of the 868 subjects are male and 
half are female), and that the control and experimental treatments are each assigned to half 
of the subjects in each block. This is accomplished with two restricted randomizations. 

The advantages of a blocked experiment over a completely randomized design should 
be evident in this example. Randomization alone cannot guarantee that the same number of 
males and females will receive each treatment. Thus if one gender tends to have fewer colds, 
differences in the treatment groups may be observed even when the experimental treatment 
has no effect. Another benefit of blocking is that it can increase the range of validity for 
the conclusions from the experiment. Blocking of experimental units according to their 
characteristics (e.g., by age) can be employed to provide sufficient variability between 
groups of experimental units in different blocks for a wide range of generalizability and yet 
achieve high precision because of small experimental errors within blocks. 

As a general principle, an experimenter should always try to remove any known or 
potential sources of variability, either by holding the nuisance factors constant throughout 
the experiment or by blocking. Randomization within blocks provides additional protection 
against any unknown sources of variability that may be present. 


Comments 


1. The amount of variance reduction achieved by blocking can be seen from a regression context. 
Suppose in the vitamin C example that the model for the response of the jth subject having gender i 
(i — 1 if female; i = O if male) is: 


Ү = Po + В.Х + ВХ + Ej (15.7) 


x, =)! if ijth child receives vitamin C 
i)" l0 if ijth child receives placebo 


x, 1 ifithchild is female 
U27 30 ifithchild is male 


Here f, can again be interpreted as the change in mean response due to receiving vitamin C (relative 
to receiving the placebo) and f; is the change in mean response for females (relative to males). We 
will consider this new model which takes into account the potential effects of gender to be the "full" 
model. If gender is ignored in the design of the study, the appropriate "reduced" model is (15.4). 
Let SSE(F) denote the sum of squares for the full model—corresponding to the blocked design, 


658 Рай Four Design and Analysis of Single -Factor Studies 


and let SSE(R) denote the sum of squares for the reduced model—corresponding to 


the comple 
randomized design. Then we have: Рачу 


SSE(F) = SSTO — 58К(Х|. X2) = SSTO — {SSR(X1) + SSROCGIX D] (153) 


If the number of observations in each block is the same, it can be shown that X ү and X; are uncorrelated 
(i.e., orthogonal), hence SSR(X5| X,) = SSR( X2). Thus: 

SSE(F) = SSTO — |(SSR(X4) + SSR(X3)] (15 9) 
From reduced model (15.4), 


SSE(R) = SSTO — SSR(X|) (15.10) 


and it follows from (15.9) and (15.10) that SSE(F) = SSE(R) — SSE(X2). Therefore SSR(X;) 
represents the reduction in the error sum of squares achieved with blocking. 

2. When blocking on a nuisance factor 15 not possible at the design stage, variance reductions сап 
sometimes be achieved at the analysis stage by including the nuisance factor as an additional predictor 
in the linear model for the response. Returning to the vitamin C example, suppose that blocking by 
prior gender was not possible. Nevertheless, model (15.7), which considers gender effects, could be 
employed at the analysis stage if the gender of each subject is recorded. By adding gender (X5) as an 
additional predictor to model (15.4), we may realize variance reductions similar to those described in 
Comment 1 for blocking. This approach, called the analysis of covariance, is discussed in Chapter 22, 


Measurements 


The measurement process is another important element of experimental designs. Ideally, 
the measurement process should produce measurements that are unbiased and precise, 
Measurement bias can cause serious difficulties in the analysis of a study. An important 
source of measurement bias is due to unrecognized differences in the evaluation process. 
For example, a group of plants randomly assigned to a new fungicide treatment might 
unintentionally be evaluated by the investigators to be responding better to the treatment 
than actually is the case because of a desire to show the new treatment to be effective. 
When the experimental unit is a person, knowledge of the treatment by the person may 
also influence the measurement obtained. For instance, a person who knows that the food 
additive is salt may respond differently in the evaluation of the tastiness of a vegetable 
than if the additive were unknown. This source of measurement bias can be minimized by 
concealing the treatment assignment to both the experimental subject and the evaluator: A 
study using this kind of concealment is called a double-blind study. When knowledge of 
the assignment is withheld only from the experimental subject or the evaluator, the study is 
called a single-blind study. 


15.8 An Overview of Standard Experimental Designs 


In this section, we give an overview of the best-known and most frequently used experimental 
designs. In addition, we provide linear statistical models associated with the most basic of 
these designs. Each of the designs introduced here will be treated in greater detail in the 
chapters that follow. 


3 


Chapter 15 Introduction to the Design of Experimental and Observational Studies 659 


pletely Randomized Design 


Ра 


:HGURE 15.6 
Plot—Quick 
‘Bread Volume 
Example. 


The simplest form of designed experiment is the completely randomized design. With this 
design, treatments are randomly assigned to the experimental units. This design is most 
useful when the experimental units are relatively homogeneous. Completely randomized 
designs are quite flexible; they can be used with any number of treatments and permit 
different sample sizes for different treatments. 

The quick bread experiment is an example of a completely randomized design. This 
design was based on four treatments (ow, medium, high, and very high temperatures) and 
eight experimental units (package mixes) leading to two replicates of each treatment. The 
results of the experiment have been summarized using a scatter plot in Figure 15.6. This 
scatter plot suggests that temperature does affect bread volume and that the largest volume 
is obtained by baking the bread at the high oven temperature. 

A linear statistical model for the response is: 


(15.11) 


Y= Overall Treatment Experimental 
^ | Constant Effect Error 


We shall model the treatment effect as a qualitative factor having four levels. Thus, as 
described in Section 8.3, we can employ three indicator variables: 


X= 1 if treatment 1 
4 O otherwise 


x, l! if treatment 2 
2 O otherwise 


X= 1 if treatment 3 
ы O otherwise 


and we obtain for the jth replicate of treatment i: 
Yi; = Bo + В.Х + ВХ + Өз Хз + &ij (15.12) 


Notice that all of the predictors are indicator variables. For this reason, as we shall see in 
Chapter 16, the model in (15.12) is sometimes referred to as an analysis of variance model. 


1,500 


500 


Low Medium High Very High 


Oven Temperature , 


660 Part Four 


Design and Analysis of Single-Factor Studies 


Assuming that the errors are independent N (0, o°), testing for the presence Of trea, 
effects is accomplished using the overall F* test statistic (6.39b) for the presen 
regression relation: 


tment 
ce ofa 


Ho: Ву = в = В = 0 
H,: пога! В; (К = 1.2. 3) equal zero (15113) 


If Ho is rejected, the investigator may want to determine which levels of temperature lead’ 
to different volumes, which lead to similar volumes, and, perhaps, which temperature max. 
imizes bread volume. These and other issues concerning the analysis of completely гар. 
domized designs are taken up in Chapters 16—18. 


Factorial Experiments 


Completely randomized designs can be used in single-factor studies or crossed, multifactor 
studies. Recall that in a crossed multifactor study the treatments correspond to the set of al] 
possible combinations of the factor levels. Such designs are also referred to as completely 
randomized factorial designs. 

The chemical yield experiment—whose treatment combinations are displayed in the 
two-way table in Figure 15.2a—is an example of a 2 x 3 factorial design. In another 
example, a sheet-aluminum manufacturer was interested in characterizing the effects of 
three coolant factors on the quality of the finish of the aluminum produced. During the 
manufacturing process, a molten aluminum strip is cooled using a mixture of water and oil 
at three different points during production. Factors and associated levels of interest were: 
coolant temperature (low, high), coolant oil percentage (low, high), and coolant volume (low, 
high). The 2° = 8 treatment combinations are displayed in the cube plot in Figure 15.7a. This 
design is sometimes referred to as a 2 x 2 x 2 or a completely randomized 2? factorial design. 

Analysis of completely randomized factorial designs again involves the use of model 
(15.11) for completely randomized designs. However, when the treatments have factorial 
structure, it is often of interest to determine whether or not there are interaction effects among 
the individual factors. A linear statistical model that incorporates the factorial treatment 


FIGURE 15.7 Full Factorial and Fractional Factorial Designs—Aluminum Rolling Mill Example. 


Oil Content 


High 


Low 


(a) 23 Full Factorial Design (b) 2371 Fractionat Factorial Design 
a 


High 
Е 
g 
Volume S 
o 
[e] 

Low 

l —L — -L L——* 
Low High Low High 


Temperature Temperature 


Жы ЫЛ. me 


Chapter 15 Introduction to the Design of Experimental and Observational Studies 661 


structure has the following general form: 


y- Overall First-Order Interaction Experimental 
— | Constant Treatment Effects Treatment Effects Error 


(15.14) 


In the sheet aluminum example, let X;, X2, and Хз be the indicator variables that denote 
the presence (X; — 1) or absence (X; — 0) of each treatment. These are the predictors that 
correspond to “first-order” treatment effects in (15.14). The interactive treatment effects 
will be modeled using cross products, just as we did in regression. Here there are four cross 
product terms ќо Бе considered, X, X2, X, Хз, X; X4, and X, X; X4. Models for factorial 
experiments are taken up in detail in Chapters 19 and 24. 


‘domized Complete Block Designs 


As discussed in Section 15.2, in a blocked design, heterogenous experimental units are 
divided into homogeneous blocks, and then separate randomizations of treatments to ex- 
perimental units are carried out within each block. These designs can increase the precision 
of the inferences concerning treatment effects. An example of a blocked experiment was 
displayed in Figure 15.5 for the vitamin C example. As a second example, we shall again 
consider the quick bread volume experiment. Suppose now, however, that the company owns 
two manufacturing plants—plant A and plant B—and of the eight package mixes available, 
four were produced in plant А and four were produced in plant B. Investigators expressed 
concern that the bread volumes might be affected by different processes and raw materials 
used at the two plants. However, it was felt that the four package mixes produced by each 
plant would be relatively homogeneous. For this reason, the investigators decided to run 
the experiment in two blocks of size four. The layout for this randomized complete-block 
design is shown in Figure 15.8. 

Randomized complete block designs are often summarized graphically by producing a 
simple scatter plot of the results (as in Figure 15.6 for a completely randomized design), 
where the four responses within each block are connected by lines. Data for the blocked 
quick bread volume example are displayed in this fashion in Figure 15.9. Notice that there 
does appear to be a possible block effect: package mixes from plant B lead to consistently 
higher volumes than those from plant A. 

A linear statistical model for the response must reflect both the treatment (oven temper- 
ature) effect and the block (manufacturing plant) effect. The response model is: 


Overall Treatment Block Experimental 
Constant Effect Effect Error 


| (15.15) 


FIGURE 15.8 Randomized Complete Block Design—Quick Bread Volume Optimization 
Example. 


Plant Experimerital Unit (Package Mix) 


(Block) 


1 3 4 
High Very High Medium "E 


Medium Very High Low 


662 Part Four 


FIGURE 15.9 
Summary 
Plot—Blocked 
Quick Bread 
Volume 
Optimization 
Example. 


Design and Analysis of Single-Facior Sindies 


1,500 
1,400 Plant 8 
1,300 
1,200 
1,100 Plant A 
1,000 

900 

800 


4 1 1 1 
Low Medium High Very High 


In the quick bread volume experiment, the four treatment levels are captured using three 
indicator variables, Х|, X», and X; as described above for a completely randomized design, 
and the two block levels can be modeled using a single indicator variable Xj: 
NR | ifpackage mix is from block 1 
ы 0 if package mix is from block 2 
and we obtain: 


Yi; = Bo + Pi Xij + ВХ + BsXijs + В.Х + е; (15.16) 


where (3). 62, and fj; are the treatment effects, and В, is the block effect. Assuming that the 
errors are independent N (0, o°), testing for presence of treatment effects is accomplished 
using АЁ test statistic (2.70) for the alternatives: 


Ho: = {5 = —0 
0: fi = з = Вз (15.17) 
Hy: notall 8; = 0 


The block effect, 84, can be tested in similar fashion. The design and analysis of ran- 
domized complete block designs are discussed in greater detail in Chapter 21. 


Nested Designs 


Experiments involving purely nested factors are called nested designs. We discussed in 
Section 15.1 the use of nested factors in a study of the effects of operators on production 
yield in three manufacturing plants. Recall that three operators were selected In each of 
the three plants, and their production yields were recorded for five batches of product. The 
diagram of this experiment, shown in Figure 15.2b, indicates the nesting of operators within 
production plants: the first three operators are employed only in plant 1, the next three аге 
employed only in plant 2, and the last three are employed uniquely in plant 3. 

Multifactor experiments can involve both crossed and nested factors. In the production 
yield example, suppose that management was considering the use of control charts for 
monitoring of the production line. Then a new factor, statistical process control (SPC), 18 
to be incorporated having two levels (SPC, No SPC). This factor can easily be crossed 
with manutacturing plant and operator, as shown in Figure 15.10. This is an example of 
a crossed-nested design. 


Chapter 15 Introduction to the Design of Experimental and Observational Studies 663 


The design and analysis of experiments involving nested factors are discussed in 
Chapter 26. 


Répeated Measures Designs 


In one type of repeated measures design, the same subject (person, store, plant, etc.) receives 
all of the treatment combinations under study. For example, a repeated measures design was 
used to evaluate the effectiveness of a set of anti-inflammatory drugs, where the same patient 
was treated with each of the alternative drugs. Repeated measures designs are frequently 
used in product rating experiments, where the same consumer evaluates a set of products. 
We now consider one such example in detail. 

Consider a taste-testing experiment to be conducted by a food manufacturer in which 
consumer acceptance of three breakfast cereal formulations is to be assessed. The three 
cereal formulations are identical except for the three levels (low, medium, and high) of 
sweetener to be used in the formulation. Each formulation is to be rated on a 10-point 
hedonic (likability) scale, and 12 consumers are available to rate the products. 

With 12 consumers, a completely randomized experiment could be used, allowing for 
four complete replicates, as shown in Figure 15.11a. However, consumers differ consid- 
erably in their sensory perception of food products (e.g., children prefer higher levels of 
sweetness, adults prefer lower levels of sweetness) and so our experimental units would 
not be particularly homogeneous. One could consider blocking on age, but an even more 
effective approach is to have each consumer rate all three products. With this setup, each con- 
sumer becomes a block, and the experimental units are the separate evaluations conducted 
by each consumer. The layout for this repeated measures design in given in Figure 15.11b. 
This study involves repeated measures, because multiple responses are obtained from the 
same subject. 

Suppose now that management is also interested in determining if the perceived level of 
wholesomeness has any effect on the ratings of the product by the consumers. Two levels of 
the perceived wholesomeness factor are to be employed, and half of the subjects are to be 
assigned to each level. Consumers in the control group are told only that the product they 
are about to test is a new breakfast cereal product. Consumers in the experimental group 
are told that the product is a new health cereal, manufactured from organic whole grains. 
As before, each consumer then tastes and evaluates three versions of the cereal based on 
low, medium, and high levels of sweetener. The layout for this experiment is shown in 
Figure 15.11c. £t 


664 Part Four 


Design and Analysis of Single-Facior Siudies 


FIGURE 15.11 Alternative Designs—Food Product Taste-Testing Example. 


(a) Completely (b) Repeated (c) Split-Plot 
Randomized Design Measures Design Repeated Measures Design 
[= = 7 

Consumer | Formulation Consumer | Formulations Ni o m Consumer MENS 
r= I ————À] 

1 h 1 h hA h 1 h A R 

2 h 2 һ Б Bh 2 h R В 

3 F 3 RH h Е 3 h R F 

! : К Wholesome Ae C 

4 h, 4 ER E 4 bh OR OE 

5 Fy 5 в h Rh 5 h Б OF 

6 E 6 EROR h 6 |h A R 
[ ———————— d 

7 F 7 F3 E Е 7 Fy P, R 

8 Ез 8 h Б Rh 8 FK h R 

9 h 9 E kh b Not 9 h R R 

10 E 10 R RA E Wholesome 10 вА E 

11 Р, 11 5b в A 11 E RR 

12 h 12 h HR k 12 AR R E 

i 


This is an example of a second type of repeated measures design, in which randomizations 
at two distinct levels are being conducted. The three levels of sweetener are randomly applied 
to the three individual tastings by a given consumer; thus, for comparisons involving levels 
of sweetness in the product formulation, the individual tastings are the experimental units. 
Similarly, perceived wholesomeness is applied directly to consumers. Thus, for comparisons 
involving the levels of perceived wholesomeness, consumers are the experimental units. 
When the subject serves as an experimental unit for another treatment, the repeated measures 
design is sometimes referred to as a split-plot design. 

The design and analysis of repeated measures and split-plot designs are taken up in 
Chapter 27. 


Incomplete Block Designs 


Until now, we have only discussed the use of blocking where each block contains one or 
more replicates of the treatment combinations. Can blocking be used when block sizes are 
smaller than the number of treatments? The answer to this question is “уез,” although such 
designs are slightly more difficult to analyze. 

Consider again the breakfast cereal formulation example, only now we shall assume that 
five alternative product formulations, instead of just three, are to be evaluated by consumers. 
It is well known that a consumer's ability to discriminate among similar products in taste 
testing diminishes rapidly with the number of samples tested. Generally, no more than three 
taste evaluations are permitted. With this restriction, we see that it will not be possible for any 
given consumer to evaluate all five product formulations in a single session. Since only three 
of the five alternatives can be rated, each consumer represents a single, incomplete block. 

Aneffective experimental arrangement can still be achieved, however, through the use of 
a balanced incomplete block design, or BIBD. Ina balanced incomplete block design, every 


Chapter 15 Introduction to the Design of Experimental and Observational Studies 665 


Product Formulation 
1 2 3 4 - 5 


X 
X 
X 


Consumer 
(Block) 


© © 0 ч б щл BR WYN — 


— 


treatment appears with every other treatment in the same block the same number of times. 
In this way, comparisons between pairs of treatments can be carried out on a within-block 
basis, thus eliminating block-to-block heterogeneity. 

A BIBD with five treatments and block size three is shown in Figure 15.12. Note that 
every treatment occurs together with every other treatment exactly three times. For example, 
formulations 1 and 2 appear together in blocks 1, 2, and 3. Formulations 1 and 3 appear 
together in blocks 1, 4, and 5—and so on. Note also that this BIBD requires 10 blocks or 
subjects. In the breakfast cereal formulation example, 12 subjects were available; however, 
no BIBD exists for five treatments in 12 blocks of size three. Thus, in order to use this 
particular BIBD for the breakfast cereal formulation example, only 10 subjects would need 
to be available. 

Another form of incomplete block design, with block size equal to one, is called a latin 
square design. We take up the construction and analysis of BIBDs and latin square designs 
in Chapter 28. 


Two-Level Factorial and Fractional Factorial Experiments 
Factorial designs are effective tools for characterizing the joint effects of multiple factors. 
However, the number of treatments, which is a product of the numbers of factor levels for 
each factor, grows rapidly with the number of factors. For example, a crossed three-factor 
experiment, where each factor has three levels, will involve 33 = 27 treatment combinations. 
One way of economizing will be to limit each factor to two levels, which reduces the number 
of treatment combinations to 2? — 8 treatment combinations. The sheet aluminum produc- 
tion example discussed earlier and displayed in Figure 15.7a is an example of a 2? factorial 
design. Two-level designs are extremely useful in exploratory or screening studies where 
the objective is to identify the most important factors from a larger set of potential factors. 
When the factors are quantitative, screening experiments are usually followed up with a more 
exacting experiment, such as a response surface experiment, discussed on the next page. 
If there are a large number of factors to be screened, it may be impractical to run 
a single complete replicate. For example, a complete replicate of a six-factor, two-level 
experiment requires 2° = 64 treatment combinations. In such cases, a subset of the treatment 


666 Part Four Design and Analysis of Single-Facior Studies 


combinations can be chosen so that little or no information is lost concerning im 
main effects and low-order interactions. This chosen subset of treatment combin 
referred to as a fractional factorial design. 

Consider again the aluminum production example. An alternative fractional factorial 
design is shown in Figure 15.7b. The half-fraction displayed is based on four (Carefully 
chosen) treatment combinations from the full factorial in Figure 15.7a that wil] permit 
estimation of the three factor effects, but with no information about the interactive effects 
of the factors. 


Portant 
ations is 


Two-level factorial and fractional factorial designs are discussed in Chapter 29, 


Response Surface Experiments 


When all factors are quantitative, two-level experiments often provide good information 
on linear trends in each factor. If there is concern that the response will be substantially 
convex (bowl-shaped) or concave (mound-shaped), or if the objective of the experiment 
is to determine precisely the factor levels that lead to an optimum response, use of just 
two levels will not be adequate. Response surface designs were developed for use in these 
situations. These designs are applicable when all experimental factors are quantitative, and 
the true response function can be well approximated by a second-order polynomial. Once 
the second-order response model has been estimated, a detailed mapping of the regression 
surface can be obtained using three-dimensional response surface plots, contour plots, and 
conditional effects plots, such as those shown in Figures 8.8 and 8.9 on pages 310-311. 


Methods for design and analysis of response surface experiments are taken up in 
Chapter 30. 


15.4 Design of Observational Studies 
Observational studies are distinct from experimental studies in that random assignments 
of factor levels to the experimental units do not occur. Therefore, designed observational 
studies do not directly demonstrate cause-and-effect relationships between the explanatory 
factors and the response. They can establish association between explanatory factors anda 
response, and provide the basis for further study of potential cause-and-effect relationships. 
To infer causality, potential confounding variables would need to be identified, and subgroup 
analysis performed to try to rule out possible alternative causal factors. Some observational 
studies are conducted for descriptive purposes only, such as when various characteristics 
of a group are summarized. These studies, which are sometimes referred to as analytical 
surveys or case studies, will not be considered further. " 
Observational studies have been classified in many ways, but we will consider three 
commonly used categories, namely, cross-sectional studies, prospective studies, and retro- 
spective studies. Prospective and retrospective observational studies are often designed to 
study potential causal relationships, and are closer in spirit to experimental studies. We turn 
now to a discussion of cross-sectional observational studies. 


Cross-Sectional Studies 


A cross-sectional observational study involves measurements taken from one or more pop- 
ulations or subpopulations at a single point in time or a single time interval. Exposure (0 
a potential causal factor and the response are determined simultaneously. Cross-sectional 


Chapter 15 Introduction to the Design of Experimental and Observational Studies 667 


studies are sometimes said to provide a "snapshot" of the factors and outcome variable. For 
example, a cross-sectional study of household incomes by geographic location in a major 
metropolitan area was conducted by a marketing research department of a luxury SUV 


? manufacturer. The subpopulations consisted of the postal zip-code areas within the city. 

г, The response variable was household income, and the explanatory factor was geographi- 

cal area. Random samples of households were selected within each geographic zip-code 

E. area. The objective of the study was to carry out comparisons of household income among 
х subpopulations. 


The Minnesota Department of Transportation road use study, discussed in Chapter 11, 
page 464, is another example of a cross-sectional observational study. Here, data on the 


: average annual daily traffic for a variety of road sections were obtained for a single time 
3 interval along with various characteristics of the road sections. Multiple regression tech- 
` niques were then used to identify important predictors of the outcome variable, namely, the 


average annual daily traffic for the various road sections. 

Cross-sectional studies may be prestratified or poststratified to form subpopulations. In 
a prestratified cross-sectional study, potential explanatory factors are used to stratify the 
population into subpopulations, and random samples are obtained within each of the sub- 
populations. Alternatively, cross-sectional study data can be poststratified by the explanatory 
f factors. Comparisons of outcome measurements among the poststratified subpopulations 
к are then obtained. 


жы 


Prospective Studies 

Inaprospective observational study, one or more groups are formed in a nonrandom manner 
according to the levels of a hypothesized causal factor, and then these groups are observed 
over time with respect to an outcome variable of interest. Prospective studies answer the 
question: “What is going to happen?” The teaching effectiveness example, discussed in 
Section 15.1, is an example of a prospective observational study. Faculty either attended 
or did not attend a teaching workshop on a voluntary basis. Here the groups were self- 
selected. At the end of the following academic year, teaching effectiveness scores were 
obtained for all faculty, and it was found that the average effectiveness of faculty who 
attended the seminar was greater than that for the group of faculty who elected not to attend 
the seminar. The fact that the "treatment" preceded the response in time is suggestive of a 
potential cause-and-effect relationship, but, as noted earlier, an experiment is required for 
"proof." Prospective studies are also known as cohort studies and can often be analyzed 
using regression models or analysis of variance techniques. 

Prospective observational studies may be conducted utilizing historical records. For 
example, from the medical histories obtained from a health maintenance organization, 
researchers were able to identify women who received estrogen supplements over long 
periods of time, and women who did not. A prospective study was then carried out to 
explore potential links between estrogen therapy and heart disease. 


Retrospective Studies 
In a retrospective observational study, groups are defined on the basis of an observed 
outcome, and the differences among the groups at an earlier point in time are identified as 
potential causal effects. Retrospective studies answer the question: "What has happened?" 
A famous retrospective study carried out in the 1950s compared the lifestyles of individuals 
d 


668 Part Four 


Matching 


Design and Analysis of Single-Facior Sudies 


with lung cancer to those of individuals who did not have lung cancer. These studies led to 
hypotheses about the causal effects of cigarette smoking. Notice that in сотрагіѕо 
а prospective study, the roles of the response and explanatory variables are reversed, Inz 
prospective study, the response is the effect (e.g., increased teaching effectiveness) and c 
explanatory factor is the hypothesized cause (e.g., workshop attendance). In a retros І 


: P : | Pective 
study, the response variable is the hypothesized cause (e.g., smoking), and the Predictor or 


©. 
explanatory factor is the potential effect (e.g., presence or absence of lung cancer), 

Retrospective studies are sometimes used in manufacturing process monitoring, For еу. 
ample, a manufacturer may suddenly receive reports of a cluster of failures of a particular 
product part while in use in the field. From records, it may be possible to obtain characteris. 
tics of the manufacturing process at the times that the failed parts were produced, and to com. 
pare these characteristics to those corresponding to other parts that have not failed. This may - 
suggest manufacturing operating conditions that led to the production of the defective parts, 

The surgical unit example discussed in Chapter 9 on page 350 is a retrospective obser. 
vational study. Patients who had a particular type of liver operation and died were selected 
for study. Preoperative factors were then used to try to predict survival times following the 
operation using multiple regression techniques. 

Retrospective studies have an advantage over comparable prospective studies in terms 
of efficiency when an outcome of interest occurs infrequently. Epidemiologists frequently 
use retrospective designs to study rare-event diseases. For example, a prospective study of 
the effects of a diet on the incidence of stomach cancer may well require a lengthy period 
of time and many more subjects than would be required by a retrospective study. The 
retrospective study would identify persons who have stomach cancer (referred to as cases) 
and persons who do not have stomach cancer (referred to as controls) and look back in 
time to assess differences in eating habits. Retrospective studies that require subjects or 
investigators to construct case histories from memory are susceptible to recall bias, and 
should be used with caution. The process-monitoring study just discussed is an example of 
an archival retrospective study, where the necessary historical data exists. Archival studies 
do not suffer the same susceptibility to recall bias. 

Retropective studies are also known as case-control and ex post facto studies. 


п With 


In our discussion of designed experiments, we noted that if the experimental units were 
heterogeneous, the experimental error can be reduced and the precision of the comparisons 
among treatments can be improved through the use of blocking techniques. In an observa- 
tional study, treatments are not assigned at random to experimental units, so blocking is not 
technically possible. However, matching, a procedure that is analogous to blocking, can be 
employed to achieve similar reductions in variance. 

Returning to the observational study of teaching effectiveness, recall that the treatments 
(attend workshop, do not attend workshop) were not randomly assigned to the faculty 
members. Rather, about half of the faculty volunteered to attend the workshop. As teachers, 
faculty in business schools are relatively heterogeneous. They vary in terms of such factors 
as age, gender, field or department, quantitative orientation, prior teaching effectiveness, 
and so on. In a matched study, each faculty member who attended the workshop is matched, 
on the basis of nuisance factors such as those just noted, to another faculty member who 
did not attend the workshop. Faculty who are not matched are not included in the study. In 


Chapter 15 Introduction to the Design of Experimental and Observational Studies 669 


effect, each match leads to a "block" size of two. Any observed differences in the teaching 
effectiveness between the matched faculty members is due either to the treatment factor— 
= here workshop attendance—or to other unidentified or uncontrolled nuisance factors. 
There are a number of approaches used for identifying matches. If the nuisance factor 
is categorical, taking on just a few distinct values (e.g., male, female), a match occurs if 
two cases fall into the same category or class. This is called within-class matching. If more 
than one categorical nuisance factor is present, for example, grade and gender, a match 
E occurs if two cases fall into the same category for both of the confounding factors. When 
the confounding factor is discrete or continuous, for example, pretest score on a 0—100 
К basis, it is common to change the factor into a categorical factor—for example by creating 
" three pretest categories—and then again declaring a match if two cases fall into the same 
| category. 
f A more precise method of matching discrete or continuous confounding factors is called 
caliper matching or interval matching. In caliper matching, two values of a confounding 
factor are considered to have matched if their absolute difference is less than some pre- 
specified value. For example, two faculty may be considered a match on the age dimension 
if the absolute difference in their ages is not greater than five years. A disadvantage of 
caliper matching is that if the specified maximum difference is too small, it may be difficult 
to find a sufficient number of matches to perform the study. 
Other methods of matching continuous confounding factors include mean matching or 
balancing, and nearest available matching. Reference 15.2 gives a complete discussion of 
matching methods. 


Comment 

An alternative to matching at the design stage is the use of covariance analysis. A brief introduction 
to this approach was given in a comment in Section 15.2. The same adjustment techniques can be 
used in the analysis of observational studies for known confounding factors that are not held constant. 
Again, these techniques are discussed in Chapter 22. a 


15.5 Case Study: Paired-Comparison Experiment 


In this section we consider the design and analysis of the paired-comparison or matched- 
pairs design. This is the most basic form of a randomized complete block design, involving 
just two treatments arranged in blocks of size two. Because the example uses subjects as 
blocks, the experimental layout also represents the simplest instance of a repeated measures 
design. The example will also serve to illustrate the analysis techniques used in a matched 
observational study. 

The objective of a product-improvement project at a major pharmaceutical company was 
to reduce the sensitivity of skin to the injection of an allergen. A new experimental allergen 
was developed and dermatologists were interested in comparing the new formulation to the 
existing product. Reactions to allergen injections vary greatly from person to person, and it 
was decided that all comparisons of the new treatment and standard control treatment should 
be conducted on a within-subject basis. Thus a randomized complete block experiment was 
utilized, where blocks correspond to subjects, and each subject was injected with both the 
experimental and control allergens, once in each arm. Here, the experimental units are 


670 PartFour Design and Analysis of Single-Factor Studies 


TABLE 15.1 
Data and 
Descriptive 
Statistics— 
Skin Sensitivity 
Experiment. 


FIGURE 15.13 
Summary 
Plot—Allergen 
Sensitivity 
Example. 


the subjects’ arms, and each block consists of two experimental units. Randomizag; 
accomplished by randomly assigning the treatments to the right or left arms for each subject 

Twenty subjects were randomly chosen from a pool of available subjects for testing. 
experimental layout, randomization, and results of the 40 tests are shown in Table 15.1. The 
response, skin sensitivity, is obtained by measuring the diameter of the red area surroundin 
the injection in centimeters. The results are plotted, with plot symbols from the same block 
connected, in Figure 15.13. The preponderance of negative slopes in the plot suggests that 
the experimental formulation leads to reduced skin sensitivity. 

From (15.15) a linear statistical model for the experiment is: 

20 
Yi = fot BiXn +Y B Xy+ еу i= 1,2 (15.18) 


j=2 


On ig 


xX, = 1 if experimental treatment 
П Уб if control treatment 


1 if response is from subject j — 1, for j —2,...,20 
Xij = А 
О otherwise 
Control Experimental Within-Subject 
Subject Treatment Treatment Difference 
1 0.59 0.43 —0.16 
2 0.69 0.53 —0.16 
3 0.82 0.58. —0.24 
18 0.85 0.60 —0.25 
19 0.85 0.65 —0.20 
20 0.74 0.58 —0.16 
Sample Mean: .7315 .5400 —.1915 
Sample Std Dev: .0758 .0807 .0501 
0.9 
0.8 " 
5 0.7 
w 
E 0.6 
E 
0.5 
0.4 
0.3 al 
Standard Experimentat 
Allergen Allergen 


Treatment 


учүн oe 


FIGURE 15.14 
Regression 
ешь 
Allergen Skin 
Setisitivity 
Example. 


Chapter 15 Introduction to the Design of Experimental and Observational Studies 671 


The dermatologists were primarily interested in determining whether the experimental 
allergen formulation led to reduced skin sensitivity, but they allowed for the possibility that 
it might increase skin sensitivity. They thus tested the alternatives: 


Ho: fi =0 


(15.19) 
Ha: hi + 0 

MINITAB regression results for this model are shown in Figure 15.14. We see that 
the estimated treatment effect is b} = —.1915, and the 19 estimated block effects are 
b, = —.1500, Рз =, —.0500, and so on. The test statistic corresponding to the estimated 
treatment effect is 7% = —17.10. To carry out the test indicated in (15.19) at the о = .05 
level, we require /(.975; 19) = 2.093. Since |t*| = 17.10 > 2.093, we conclude H,, that 
fi Æ 0. Since b, was negative, the dermatologists concluded that the new formulation 
significantly reduces skin irritation. 

Note that the investigators were not primarily interested in determining whether or not 
subject (block) effects were present. Blocking was used here to increase the precision of 
the comparisons between the experimental and control treatments and it was fully expected 
that significant subject-to-subject differences would be present. Nevertheless, a test for the 


Predictor Coef SE Coef T P 
Constant 0.75575 0.02566 29.45 0.000 
х1 —0.19150 ` 0.01120 —17.10 0.000 
x2 —0.15000 0.03541 —4.24 0.000 
ХЗ —0.05000 0.03541 -141 0.174 
X4 0.04000 0.03541 1.13 0.273 
X5 0.06500 0.03541 1.84 0.082 
X6 —0.14000 0.03541 -3.95 0.001 
X7 — 0.08500 0.03541 —2.40 0.027 
x8 0.03000 0.03541 0.85 0.407 
х9 0.04000 0.03541 113 0.273 
X10 —0.08000 0.03541 -2.26 0.036 
X11 0.08000 0.03541 2.26 0.036 
X12 0.01000 0.03541 0.28 0.781 
X13 —0.02500 0.03541 -071 0.489 
X14 —0.12000 0.03541 —3.39 0.003 
X15 0.05000 0.03541 -1.41 0.174 
X16 —0.08500 0.03541 —2.40 0.027 
Х17 —0.07500 0.03541 —212 0.048 
X18 —0.04500 0.03541 -137 0.219 
X19 0.06500 0.03541 1.84 0.082 
X20 0.09000 0.03541 2.54 0.020 
S = 0.03541 R-Sq = 96.0% R—Sq(adj) = 91.996 


Analysis of Variance 


Source DF 55 MS F P 
Regression 20 0.578750 0.028937 23.07 0.000 
Residual Error 19 0.023828 0.001254 


Total 39 0.602577 


672 Part Four 


Design and Analysis of Single-Factar Studies 


effect of blocking can be carried out using (2.70). The alternatives here are: 
Ho: fo = +++ = Во = 0 
H,: not all £j (k = 2,3,..., 20) equal zero (15.20) 


For these data, it can be shown that blocking was effective in significantly reducing the 
error variance. 


15.0 Concluding Remarks 
Les 


In this chapter, we have outlined the basic differences between observational and exper 
imental studies, and we have described how experimental studies lead to a much firme 
basis for making inferences concerning cause and effect. We have also previewed the main 
types of designed observational and experimental studies. In doing so, we have shown that 
the statistical models studied in Chapters 1—14 provide the bases for statistical analysis of 
well-designed studies. 

In the chapters to follow, we will consider the design and analysis of experimental and 
observational studies in greater detail. Design issues not yet discussed, such as sample 
size planning and power considerations, will be taken up for each design type. There will 
also be an increased emphasis on the analysis of categorical factors. The linear model for 
that case is called the analysis of variance (ANOVA) model. While standard regression 
approaches can always be used, we will see that when the study design is balanced, the 
use of ANOVA greatly simplifies the analysis. If the study is not balanced, we will simply 
return to the regression approach. Finally, when all factors are treated as categorical, the 
analysis frequently focuses on comparisons among treatments or factor-level combinations, 
A discussion of such multiple comparison procedures will accompany nearly every class 
of study design. 


Cited 
References 


15.1. Cochran. W. G.. and G. M. Cox. Experimental Designs. 2nd ed. New York: John Wiley & Sons, 
1992. 

15.2. Cochran, W. G. Planning and Analysis of Observational Studies. New York: John Wiley & 
Sons, 1983. 


Problems 


15.1. In an experiment to study the effect of the location of a product display in drugstores of à 
chain, the manager of one of the drugstores rearranged the displays of other products so as to 
increase (ће watfic flow at the experimental display. Does this action potentially leat to either 
selection bias or measurement bias? Discuss. 

15.2. Ina study of the effect of size of team on the volume of communications within the team, can 
a double-blind procedure be utilized? A single-blind procedure? Discuss. 

15.3. Four treatments (7. 75. 73. Tj) are to be studied in an experiment with a completely randomized 
design using three replicates. Obtain the randomized assignments of treatments to experimental 
units. 

15.4. Three treatments (71. T>, 73) are to be studied in an experiment with a completely randomized 
design using five replicates. Obtain the randomized assignments of treatments to experimental 
unils. 


элш ag re 
Á 


15.5. 
15.6. 


15.7. 


15.8. 


*15.9. 


15.10. 


15.11. 


15.12. 


Chapter 15 Introduction to the Design of Experimental and Observational Studies 673 


Give an example of an experiment where a control group would not be necessary. 

Five treatments (Тү, T2, 73, 74, Т5) are to be studied in a randomized complete block design 
with four blocks. Obtain the randomized assignments of treatments to experimental units. 

Ina study to evaluate the quality of three alternative recipes for salsa, six containers of salsa— 
two from each of the three recipes—were randomly assigned to six taste panels. Each taste 
panel consisted of a team of four trained taste-testers. Each panel reached a consensus score 
for the assigned recipe. What is the experimental unit in this study? Why? 

Three high schools participated in a study to evaluate the effectiveness of a new computer- 
based mathematics curriculum. In each school, four 24-student sections of freshman algebra 
were available for the study. The two types of instruction (standard curriculum, computer- 
based curriculum) were randomly assigned to the four sections in each of the three schools. 
At the end of the term, a standard mathematics achievement test was given to each of the 24 
students in each section. 

a. Is this study experimental, observational, or mixed experimental and observational? Why? 
b. Identify all factors, factor levels, and factor-level combinations. 

c. What type of study design is being implemented here? 

d. What is the basic unit of study? 

An economist compiled data on productivity improvements last year for a sample of firms 
producing electronic computing equipment. The firms were classified according to the level of 
their average expenditures for research and development in the past three years (low, moderate, 
high). | 

a. Is this study experimental, observational, or mixed experimental and observational? Why? 
b. Identify all factors, factor levels, and factor-level combinations. 

c. What type of study design is being implemented here? 

d. What is the basic unit of study? 

In a study to investigate the effect of color of paper (blue, green, orange) on response rates 
for questionnaires distributed by the “windshield method" in supermarket parking lots, four 
supermarket parking lots were chosen in a metropolitan area and 10 questionnaires of each 
color were assigned at random to cars in the parking lots. 

a. Is this study experimental, observational, or mixed? Why? 

b. Identify all factors, factor levels, and factor-level combinations. 

c. What type of study design is being implemented here? 

d. What is the basic unit of study? 

A rehabilitation center researcher was interested in examining the relationship between physi- 
cal fitness prior to surgery of persons undergoing corrective knee surgery and the time required 
in physical therapy until successful rehabilitation. Data on the number of days required for 
successful completion of physical therapy and the prior physical fitness status (below average, 
average, above average) were collected. 

a. Is this study experimental, observational or mixed? Why? 

b. Identify all factors, factor levels, and factor-level combinations. 

c. What type of study design is being implemented here? 

d. What is the basic unit of study? 

In a study of the effect of applicant's eye contact (yes, no) and personnel officer's gender 
(male, female) on the personnel officer's assessment of likely job success of an applicant, 
personnel officers were shown a front view photograph of an applicant's face and were asked 


674 Part Four Design and Analysis of Single-Factor Studies 


15.13. 


*15.14. 


15.15. 


15.16. 


to give the person in the photograph a success rating score. Half of the officers in each Bender 
group were chosen at random to receive a version of the photograph in which the applicant 
made eye contact with the counselors. The other half received a version in which there was 
no eye contact. Data were collected on success ratings. 

a. Is this study experimental, observational, or mixed? Why? 

b. Identify all factors, factor levels. and factor-level combinations. 

c. What type of study design is being implemented here? 

d. What is the basic unit of study? 

An automotive engineer was intercsted in the effect of four alternative rubber compounds on 
the life of automobile tires. 'To carry out the study, five tires were manufactured from each of 
the four compounds and five automobiles were obtained for testing. With each automobile 
the four tire types were assigned at random to the four wheels. Each automobile was ven 
for 40,000 miles and the amount of wear on each of the four tires was recorded. 

a. What type of study is this. experimental, observational, or mixed? Why? 

What is the basic unit of study? 

What factors and factor levels are being studied here? 


What type of study design is being implemented here? 


ово c 


Suppose that six compounds were under study instead of four. What type of study design 
is suggested? 


A research laboratory was developing a new compound for the relief of severe cases of hay 
fever. The amounts of two active ingredients (low, medium, high) in the compound were varied 
at three levels each using 18 volunteers. Randomization was used in assigning volunteers to 
each of the treatment combinations. Data were collected on hours of relief. 

a. Is this study experimental, observational, or mixed? Why? 

Identify all factors. factor levels. and factor-level combinations. 

Describe how randomization would be performed in this study. 

What type of study design is being implemented here? 


овор 


What is the basic unit of study? 


Kidney failure patients are commonly treated on dialysis machines that filter toxic substances 
from the blood. The approximate dose for effective treatment depends on, among other things, 
duration of treatment and weight gains between treatments as a result of fluid buildup. To study 
the effects on the number of days hospitalized (attributable to the disease) during a year, а 
random sample of patients who had undergone dialysis treatment at a large dialysis facility was 
obtained. Treatment duration was categorized into two groups (short duration, long duration). 
Average weight gain between treatments during the year was categorized in three groups 
(slight, moderate, substantial). 


a. Is this study purely experimental or observational or mixture of both? Why? 
b. Identify all factors. factor levels, and factor-level combinations. 

c. What type of study design is being implemented here? 

d. What is the basic unit of study? 


In a study of recall memory, three different questionnaires (A, B, C) were administered to nine 
subjects at three different times three months apart about the number of trips to a shopping 
center during the preceding three months. Each time a different questionnaire was used an 
the order of the assignments of questionnaires for each subject was randomized. 


15.17. 


15.18. 


15.19. 


*15.20. 


Chapter 15 Introduction to the Design of Experimental and Observational Studies 675 


a. Is this study purely experimental or observational or mixture of both? Why? 
b. Identify all factors, factor levels, and factor-level combinations. 

c. What type of study design is being implemented here? 

d. What is the basic unit of study? 


A chemical company wished to study the consistency of the strength of one of its liquid 
chemical products. The product is made in batches in large vats and then is barreled. The barrels 
are subsequently stored for a period of time in a warehouse. To examine the consistency of the 
strength of the chemical, an analyst randomly selected five different batches of the product 
from the warehouse and then selected four barrels per batch at random. Three determinations 
per barrel were made. ` 


а. Ts this study purely experimental or observational or mixture of both? Why? 
b. Identify all factors, factor levels, and factor-level combinations. 

c. What type of study design is being implemented here? 

d. What is the basic unit of study? 


A study was undertaken in an effort to reduce the occurrence of dents in a windshield molding 

manufacturing process. The dents are caused by pieces of metal or plastic that are carried into 

the dies during stamping and forming operations. Four factors were identified for use in an 

eight-run experiment: poly-film thickness—used to protect the metal strip during manufac- 

turing to reduce surface blemishes (low, high), oil mixture ratio for surface lubrication (low, 

high), operator glove type (cotton, nylon), underside oil coating (no coating, coating). During 

each run of the experiment, 1,000 moldings were fabricated in a batch; the response (Y) is the 

number of defect-free moldings produced. 

a. Is this study purely experimental or observational or mixture of both? Why? 

b. Identify all factors, factor levels, and factor-level combinations. 

c. What type of study design is being implemented here? 

d. What is the basic unit of study? 

Assemblers in an electronics firm attach components to a newly developed “board” to be used 

in automatic-control equipment in manufacturing plants. A study was conducted to determine 

the effect of sequence of assembling the components (sequence 1, sequence 2, sequence 3) 

on the mean time to assemble a board. Potential nuisance factors are gender of the assembler 

(male, female) and amount of the assembler’s prior experience (under 18 months, 18 months or 

more). Assume that the following assemblers are available for the study: four males with under 

18 months experience, three females with under 18 months experience, five male assemblers 

with 18 months or more experience, and four females with 18 months or more experience. 

а. Suggest an experimental design that accounts for the two nuisance factors. What type of 
study design did you recommend? 

b. Show how the randomization is to be carried out for your study design in part (a). 

c. What is the experimental unit in your study design? 

An experiment involving the case hardening of lightweight shafts machined from bars of an 

alloy was run to study the effects of the amount of chemical agent added to the alloy in a 

molten state (low, high), the temperature of the hardening process (low, high), and the time 

duration of the hardening process (low, high). Outcome data measured the hardness of the 

rods tested. It will be possible to machine 16 bars in the study. 

a. Suggest an experimental plan for the study. What type of study design did you recommend? 

b. Show how the randomization is to be carried out for your study design in part (a). * 

c. Whatis the experimental unit in your study design? 


676 PartFour Design and Analysis of Single-Factor Studies 


Exercise 


15.21. 


15.22. 


*15.23. 


15.24. 


a 


An experiment is to be conducted to compare the effectiveness of four household detergents. 
The response is to be the degree of stain removal from a section of clothing on а 10-poing 
scale (1 = no stain removed, 10 = stain completely removed). 
a. Identify the experimental unit. 

b. Identify the experimental factor(s), levels, and any factor-level combinations if present, 

c. Name two potential blocking factors. 

d. Propose an experiment to accomplish the objectives of the study. How would you сапу 
out the randomization? 

An experiment is to be carried out to determine the optimal combination of microwave oven 

settings for microwave popcorn. Cooking time has three possible settings (3, 4, and 5 minutes) 

and cooking power has two settings (low power, high power). The response (to be minimized) 
is the number of burned plus the number of unpopped kernels. 

a. Identify the experimental unit. 

b. Identify the experimental factor(s), levels, and any factor-level combinations if present, 

c. Name two potential blocking factors. 

d. Propose an experiment to accomplish the objectives of the study. How would you carry 
out the randomization? 

Refer to the skin sensitivity example data in Table 15.1. 

a. Test the hypothesis that the mean within-subject difference is zero using the г test for paired 
observations in (A.69) using о = .05. State the alternatives, decision rule, and conclusion. 
What is the P-value of your test? Do your results agree with those obtained on page 671? 
Should they agree? 


b. Conduct the test for block effects using œ = .05. State the alternatives, decision rule, and 
conclusion. What is the P-value of your test? Is your conclusion of primary interest in this 
study? Why or why not? 


Show that (15.5b) follows from (15.5a) for model (15.4). 


Single-F actor Studies 


In the last chapter, we presented a general introduction to the design of experimental and 
observational studies. In this and the next two chapters, we shall focus on the design and 
analysis of single-factor studies. This includes the development of single-factor analysis of 
variance (ANOVA) model, the analysis and interpretation of factor level means, assessment 
of model adequacy, and the use of remedial measures when necessary. 

In this chapter, we briefly review the design of single-factor studies and the associated 
linear models, then discuss the relation between regression and analysis of variance. In the 
next few sections we introduce in detail the single-factor ANOVA model and the associated 
F test for equality of factor level means. We then consider alternative formulations of the 
ANOVA model, followed by a regression approach to the single-factor ANOVA model. In 
the last few sections, we consider a nonparametric randomization test as an alternative to 
the ANOVA test, and, finally, we present two methods for the planning of sample sizes in 
single-factor studies. 


16.1 Single-Factor Experimental and Observational Studies 


Example 1 


Single-factor experimental and observational studies are the most basic form of comparative 
studies used in practice. In a single-factor experimental study, the treatments correspond to 
the levels of the factor, and randomization is used to assign the treatments to the experimental 
units. In the following we present three examples of single-factor studies. The first two 
examples are experimental studies, and the third is a cross-sectional observational study. 
We then briefly review the approach described in Chapter 15 for modeling a single-factor 
study. 


A hospital research staff wished to determine the best dosage level for a standard type of drug 
therapy to treat a medical condition. In order to compare the effectiveness of three dosage 
levels, 30 patients with the medical problem were recruited to participate in a pilot study. 
Each patient was randomly assigned to one of the three drug dosage levels. Randomization 
was performed in such a way that an equal number of patients ended up being evaluated 
for each drug dosage level, i.e., with exactly 10 patients studied in each drug dosage level 
group. This is an example of completely randomized design, based on a single, three-level 
quantitative factor. This particular design is said to be balanced, because each treatment is 


replicated the same number of times. б? 


678 Ра Four Design and Analysis of Single-Factor Studies 


Example 2 


Example 3 


In an experiment to investigate absorptive properties of four different formulations “а 
paper towel, five sheets of paper towel were randomly selected from each of the four types 
(formulation 1, formulation 2, formulation 3, and formulation 4) of paper towel, Twenty 
6-ounce beakers of water were prepared, and the twenty paper towel sheets were randomly 
assigned to the beakers. Paper towels were then fully submerged in the beaker Water for 
10 seconds, withdrawn, and the amount of water absorbed by each paper towel sheet was 
determined. This is an example of a completely randomized design, based on a single 
four-level qualitative factor. , 


Four machines in a plant were studied with respect to the diameters of ball bearings they 
produced. The purpose of the study was to determine whether substantial differences in 
the diameters of ball bearings existed between the machines. If so, the machines would 
need to be calibrated. This is an example of an observational study, as no randomization of 
treatments to experimental units occurred. 

As we noted in Chapter 15, although the first two examples are experimental studies and 
the third is an observational study, the methods used for statistical analysis are generally the 
same. If the single factor has r levels, one approach to constructing a linear statistical model 
employs г — | indicator variables as predictors. Then the response for the jth replicate of 
the ith treatment or factor level is modeled: 


Ү = Bot ВХ + + В. Хуе + у 


where: 
E if treatment 1 
il 10 otherwise 
хх. =]! if treatment 2 
i2 © l0 otherwise 
х _ fl ifueatmentr — 1 
ul 7 10 otherwise 


Recall that because all of the predictors are indicator variables, this model is sometimes 
referred to as an analysis of variance model. 

For the first example, we have an alternative. Because the factor—dosage level—is 
quantitative with three levels, we could also model its effect using a second-order (or lower- 
order) polynomial regression model, as described in Section 8.1. Specifically, two choices 
for the first example are: 


Y; = Po + В.Х + ВХ; +i; АМОУА Model 


X = 1 if weatment | 
4^7 310 otherwise 


1 if treatment 2 
72 O otherwise 


Chapter 16  Single-Factor Studies 679 
or, employing second-order polynomial model (8.1): 
Ү = Bo + fixi; + Bux}; + &ij Regression Model 
where: 
х;у = centered dosage level amount for the ijth case 


In the next section, we discuss the choice between the two types of models. 


46.2 Relation between Regression and Analysis of Variance 


illustrations 


Regression analysis, as we have seen, is concerned with the statistical relation between 
one or more predictor variables and a response variable. Both the predictor and response 
variables in ordinary regression models are quantitative. The regression function describes 
the nature of the statistical relation between the mean response and the levels of the predictor 
variable(s). 

We encountered the use of analysis of variance in our consideration of regression. It 
was used there for a variety of tests concerning the regression coefficients, the fit of the 
regression model, and the like. The analysis of variance is actually much more general than 
its use with regression models indicated. Analysis of variance models are a basic type of 
statistical model. They are concerned, like regression models, with the statistical relation 
between one or more predictor variables and a response variable. Like regression models, 
analysis of variance models are appropriate for both observational data and data based on 
formal experiments. Further, as in the usual regression models, the response variable for 
analysis of variance models is a quantitative variable. Analysis of variance models differ 
from ordinary regression models in two key respects: 


1. The explanatory or predictor variables in analysis of variance models may be qualitative 
(gender, geographic location, plant shift, etc.). 

2. If the predictor variables are quantitative, no assumption is made in analysis of variance 
models about the nature of the statistical relation between them and the response variable. 
Thus, the need to specify the nature of the regression function encountered in ordinary 
regression analysis does not arise in analysis of variance models. 


Figure 16.1 illustrates the essential differences between regression and analysis of variance 
models for the case where the predictor variable is quantitative. Shown in Figure 16.1a is 
the regression model for a pricing study involving three different price levels, X — $50, 
$60, $70. Note that the XY plane has been rotated from its usual position so that the Y axis 
faces the viewer. For each level of the predictor variable, there is a probability distribution 
of sales volumes. The means of these probability distributions fall on the regression curve, 
which describes the statistical relation between price and mean sales volume. 

The analysis of variance model for the same study is illustrated in Figure 16.1b. The three 
price levels are treated as separate populations, each leading to a probability distribution 
of sales volumes. The quantitative differences in the three price levels and their statistical 
relation to expected sales volume are not considered by the analysis of variance model. 


680 PartFour езген aud Analysis of Single-Factor Studies 


FIGURE 16.1 Relation between Regression and Analysis of Variance Models. 
(a) Regression Model (b) Analysis of Variance Model 


© 
9 
e $50 $60 $70 
& Regression Curve Price- Pce Price 
50 
0 Y А M2 ГА Y 
Sales Volume Sales Volume 
FIGURE 16.2 Type2  lypei lype4 Type 3 
Analysis of 
Variance 
Model 
Representation 
—Incentive 
Pay Example. 
51 58 70 78 84 90 Y 
Yo нә m Yi ba из 
uo) Lo 
éy = -7 ey =8 


Employee Productivity 


Figure 16.2 illustrates the analysis of variance model for a study of the effects of four 
different types of incentive pay systems on employee productivity. Here, each type of 
incentive pay system corresponds to a different population, and there is associated with 
each a probability distribution of employee productivities (Y). Since type of incentive pay 
system is a qualitative variable, Figure 16.2 does not contain a corresponding regression 
model representation. 


Choice between Two Types of Models 
As we have seen in Chapter 8, regression analysis can handle qualitative predictor variables 
by means of indicator variables. When indicator variables are so used with regression 
models, the regression results will be identical to those obtained with analysis of variance 
models. The reason why analysis of variance exists as a distinct statistical methodology I5 
that the structure of the predictor indicator variables permits computational simplifications 
that are explicitly recognized in the statistical procedures for the analysis of variance. 


Vt 


ъ 


Chapter 16  Single-Factor Studies 681 


Hence, there is no fundamental choice between regression and analysis of variance models 
when the predictor variables are qualitative. 

On the other hand, there is a choice in modeling when the predictor variables are quan- 
titative. One possibility is to recognize the quantitative nature of the predictor variables 
explicitly; this can only be done by a regression model. The other possibility is to set up 
classes for each quantitative variable and then employ either indicator variables in a regres- 
sion model or an analysis of variance model. As we mentioned in Chapter 8, the strategy of 
setting up classes for quantitative variables is sometimes followed in large-scale studies as 
a means of obtaining a nonparametric regression fit when there is substantial doubt about 
the nature of the statistical relation. Here again, analysis of variance models and regression 
models with indicator variables will lead to identical results. 


- gic Ideas 


3 Single-Factor ANOVA Model 


The basic elements of the ANOVA model for a single-factor study are quite simple. Corre- 
sponding to each factor level, there is a probability distribution of responses. For example, 
in a study of the effects of four types of incentive pay on employee productivity, there is 
a probability distribution of employee productivities for each type of incentive pay. The 
ANOVA model assumes that: 


]. Each probability distribution is normal. 

2. Each probability distribution has the same variance. 

3. The responses for each factor level are random selections from the corresponding prob- 
ability distribution and are independent of the responses for any other factor level. 


Figure 16.2 illustrates these conditions. Note the normality of the probability distributions 
and the constant variability. The probability distributions differ only with respect to their 
means. Differences in the means therefore reflect the essential factor level effects, and it 1s 
for this reason that the analysis of variance focuses on the mean responses for the different 
factor levels. 

The analysis of the sample data from the factor level probability distributions usually 
proceeds in two steps: 


1. Determine whether or not the factor level means are the same. 
2. If the factor level means differ, examine how they differ and what the implications of 
the differences are. 


In this chapter, we consider step 1, the testing procedure for determining whether or not the 
factor level means are the same. In the next chapter, we take up the analysis of the factor 
level means when the means differ. 


Cell Means Model 


Before stating the ANOVA model for single-factor studies, we need to develop some nota- 
tion. We shall denote by r the number of levels of the factor under study (e.g., r — 4 types 
of incentive pay), and we shall denote апу one of these levels by the index i (i = 1,..., ғ). 


The number of cases for the ith factor level is denoted by n;, and the total number of cases 
HE 


682 PartFour Design and Analysis of Single-Factor Studies 


in the study is denoted by ny, where: 


r 


ке 259 (16.1) 


i-i 


This notation differs from that used earlier for regression models, where the subs 
identifies the case or trial. 

For analysis of variance models we shall always use the last subscript to represent the 
case or trial for a given factor level or treatment. Here, the index j will be used to identity 
the given case or trial for a particular factor level. We shall let Y;; denote the value of the 
response variable in the jth trial for the ith factor level. For instance, Y;; is the productivi 
of the jth employee in the ith incentive plan, or the sales volume of the jth store featur; 
the ith type of shelf display. Since the number of cases or trials for the ith factor level is 
denoted by n;, we have j = 1,..., п;. 

The ANOVA model can now be stated as follows: 


Cript j 


Y = ud Ej; (16,2) 


where: 


Y;; is the value of the response variable in the jth trial for the ith factor level or 
treatment 


Hi are parameters 
ег; are independent N (0, с?) 


[JL 


This model is called the cell means model for reasons to be explained shortly. This model 
may be used for data from observational studies or for data from experimental studies based 
on a completely randomized design. 


Important Features of Model 


1. The observed value of Y in the jth trial for the ith factor level or treatment is the sum 
of two components: (a) a constant term jz; and (b) a random error term &ij. 
2. Since E[ejj] = 0, it follows that: 


ШҮ, = ui (16.3) 
Thus, all responses or observations Y;; for the ith factor level have the same expectation ш, 


and this parameter is the mean response for the ith factor level or treatment. 
3, Since w; is a constant, it follows from (A. 16a) that: 


ӨҢҮ} = о {в} = o? (16.4) 


Thus, all observations have the same variance, regardless of factor level. 

4, Since each &jj is normaily distributed, so is each Y;;. This follows from (A.36) because 
Y;; is a linear function of £j;. 

5. The error terms are assumed to be independent. Hence, the error term for the outcome 
on any one trial has no effect on the error term for the outcome of any other trial for the 


Chapter 16  Single-Factor Studies 683 


E same factor level or for a different factor level. Since the &;; are independent, so are the 
responses Y;;. 
z 6. In view of these features, ANOVA model (16.2) can be restated as follows: 


j Y;; are independent N (u;, o?) (16.5) 
Sle Suppose that ANOVA model (16.2) is applicable to the earlierincentive pay study illustration 
mp Є and that the parameters are as follows: 
ш = 70 H3 = 58 из = 90 Ша = 84 с = 4 
1 Figure 16.2 contains a representation of this model. Note that employee productivities for 
4 incentive pay type 1 according to this model are normally distributed with mean ju, = 70 
and standard deviation с = 4. 

Suppose that in the jth trial of incentive pay type 1, the observed productivity is У,у = 78. 

In that case, the error term value is £;; = 8, for we have: 

* є = Yy—pn = 78 – 70 = 8 


Figure 16.2 shows this observation Ү, ;. Note that ће deviation of Y,; from the mean ди 
represents the error term ¢,;. This figure also shows the observation Үз; = 51, for which 
the error term value is 22; = —7. 


The ANOVA Model Is a Linear Model 
ANOVA model (16.2) is a linear model because it can be expressed in matrix terms in the 
form (6.19), i.e., as Y = Xf + e. We illustrate this for a study involving r = 3 treatments, 
and for which n, = из = пз = 2. Y, X, В, and e are then defined as follows here: 


Yu 100 Ell 
Yi? 100 m E12 
= Yo = 0 10 = £21 
у= |у X=|5 10 B= lu e-[2 | (16.6) 
Үз, 0 0 1 Из £31 
Yo 0 0 1 £32 


Note the simple structure of the X matrix and that the В vector consists of the means p. 
To see that these matrices yield ANOVA model (16.2), recall from (6.20) that the vector 
of expected values E(Y;;) is given by E(Y) = ХВ. We thus obtain: 


Е{Ү|} 100 Ш 
Е{Ү›} 1 0 0 by 
Е{Ү»} 010 s m ' 
E(Ya) 0 0 1 Из 
Е{Үз;} 0 0 1 Из 


684 Part Four 


Design and Analysis of Single-Factor Studies 


This indicates properly that E[Y;;) = ш. Hence, ANOVA model (16.2)—Y, 


r А " É 2 ij = Hi + 
in matrix form is given by Y = Xf + e: 

Yu pa Eil 

Yio HA E12 

Yo H2 E21 

Y= =X g — T 

Yn B+ Ha En (16.8) 

Yzı H3 £3i 

Y» ua £y 


Since the error terms in the model have the same structure as those in general línear 
regression model (6.19)—namely, independence and constant variance—the variance. 
covariance matrix of the error terms in the ANOVA model is the same as in (6,19): 


o 0 -. 0 
: 0 o0? ... 0 R 
o fe} = |. ; (| =ol (16.9) 
0 QO «+. б? 


In addition, like for general linear regression model (6.19), the variance-covariance matrix 
of the Y responses is the same as that of the error terms: 


00ү} 2c?I (16.10) 


When ANOVA model (16.2) is expressed as a linear model, as in (16.8), it can be seen 
why it is called the cell means model, because the В vector contains the means of the 
“cells”——here factor levels. In Section 16.7 we discuss an equivalent ANOVA model called 
the factor effects model, where the B vector contains components of the factor level means. 


Interpretation of Factor Level Means 


Observational Data. In ап observational study, the factor level means и, correspond tothe 
means for the different factor level populations. For instance, in a study of the productivity 
of employees in each of three shifts operated in a plant, the populations consist of the 
employee productivities for each of the three shifts. The population mean ju; is the mean 
productivity for employees in shift 1, and иә and и are interpreted similarly. The variance 
c? refers to the variability of employee productivities within a shift. 


Experimental Data. In an experimental study, the factor level mean д; stands for the 
mean response that would be obtained if the ith treatment were applied to all units in 
the population of experimental units about which inferences are to be drawn. Similarly, 
the variance o? refers to the variability of responses if any given experimental treatment 
were applied to the entire population of experimental units. For instance, in a completely 
randomized design to study the effects of three different training programs on employee 
productivity, in which 90 employees participate, a third of these employees is assigned at 
random to each of the three programs. The mean jz; here denotes the mean productivity if 
training program | were given to each employee in the population of experimental units; the 
means шз and из are interpreted correspondingly. The variance c? denotes the variability 
in productivities if any one training program were given to each employee in the population 
of experimental units. 


Chapter 16 — Single-Factor Studies 685 


pistinction between ANOVA Models I and II 


164 F itting of ANOVA Model 


We shall consider two single-factor analysis of variance models. For brevity, we shall refer 
to these as ANOVA models I and II. ANOVA model I, which was stated in (16.2), applies to 
such cases as a comparison of five different advertisements or a comparison of four different 
rust inhibitors, where the conclusions pertain to just those factor levels included in the study. 
ANOVA. model II, to bé discussed in Chapter 25, applies to a different type of situation, 
namely, where the conclusions extend to a population of factor levels of which the levels in 
the study are a sample. Consider, for instance, a company that owns several hundred retail 
stores throughout the country. Seven of these stores are selected at random, and a sample 
of employees from each store is then chosen and asked in a confidential interview for an 
evaluation ofthe management of the store. The seven stores in the study constitute the seven 
levels of the factor under study, namely, retail store. In this case, however, management is 
not just interested in the seven stores included in the study but wishes to generalize the study 
results to all of the retail stores it owns. Another example when ANOVA model П is applica- 
ble is when three machihes out of 75 in a plant are selected at random and their daily output 
is studied for a period of 10 days. The three machines constitute the three factor levels in this 
study, but interest is not just in the three machines in the study but in all machines in the plant. 

Thus, the essential difference between situations where ANOVA models I and П are 
applicable is that model I is relevant when the factor levels are chosen because of intrinsic 
interest in them (e.g., five different advertisements) and they are not considered to be a 
sample from a larger population. ANOVA model II is appropriate when the factor levels 
constitute a sample from a larger population (e.g., three machines out of 75) and interest is 
in this larger population. Thus, ANOVA model I is also referred as the fixed effects model, 
and ANOVA model II is called the random effects model. In this and the next two chapters, 
we focus on ANOVA model I. For brevity, we omit the word “fixed” or “model I" and 
simply refer to the model as the ANOVA model. 


Comment 


The ANOVA model (16.2) for single-factor studies, like any other statistical model, is not likely to 
be met exactly by any real-world situation. However, it will be met approximately in many cases. As 
we shall note later, the statistical procedures based on ANOVA model (16.2) are quite robust, so that 
even if the actual conditions differ substantially from those of the model, the statistical analysis may 
still be an appropriate approximation. a 


The parameters of ANOVA model (16.2) are ordinarily unknown and must be estimated 
from sample data. As with normal error regression models, the method of least squares and 
the method of maximum likelihood lead to the same estimators of the model parameters ju; 
in normal error ANOVA model (16.2). Before turning to these estimators, we shall describe 
an example to be used in this chapter and the next, and we shall develop needed additional 
notation. 


The Kenton Food Company wished to test four different package designs for a new break- 
fast cereal. Twenty stores, with approximately equal sales volumes, were selected as the 
experimental units. Each store was randomly assigned one of the package designs, With each 


686 PartFour Design and Analysis of Single-Factor Studies 


TABLE 16.1 
Number of 
Cases Sold by 
Stores for Each 
of Four 
Package 
Designs— 
Kenton Food 
Company 
Example. 


FIGURE 16.3 
JMP Scatter 
Plot of Number 
of Cases Sold 
by Package 
Design— 
Kenton Food 
Company 
Example. 


Notation 


Store (j) MN 
Package Numberios. 
Design 1 2 3 4 5 Total Mean Stores 

i Yn Уо Үз Ya Ys Yı. A n; 

1 11 17 16 14 15 73 14.6 5 

2 12 10 15 19 11 67 13.4 5 

3 23 20 18 17 78 19.5 4 

4 27 33 22 26 28 136 27.2 5 

All designs = 354 Y.. = 18.63 19 


a 


35 
о 
30 
a 25 8 
3 : а 
8 20 о о 
5 8 
15 8 [e] 
10 = 8 
5 =i 1 Es 1 
1 2 3 4 


Package Design 


package design assigned to five stores. А fire occurred in one store during the study period, 
so this store had to be dropped from the study. Hence, one of the designs was tested in only 
four stores. The stores were chosen to be comparable in location and sales volume. Other 
relevant conditions that could affect sales, such as price, amount and location of shelf space, 
and special promotional efforts, were kept the same for all of the stores in the experiment. 
Sales, in number of cases, were observed for the study period, and the results are recorded 
in Table 16.1. This study is a completely randomized design with package design as the 
single, four-level factor. * 

Figure 16.3 contains а JMP scatter plot of the number of cases sold versus package 
design number. We readily see that designs 3 and 4 led to the largest sales, and that designs 
І and 2 led to smaller sales. We also see that the variability in store sales appears to be about 
the same for the four designs, consistent with ANOVA model (16.2). To make more formal 
inferences, we first need to develop some additional notation. 


As explained earlier, ¥;; represents the observation or response for the jth sample unit for 
the ith factor level. For the Kenton Food Company example, Y;; denotes the number of 
cases sold by the jth store assigned to the ith package design. For instance, Y,, represents 
the sales of the first store assigned package design 1. For our example, Y;, = 11 cases. 
Similarly, sales of the second store assigned package design 3 are Y; = 20 cases. 


Chapter 16 Single-Factor Studies 687 


The total of the observations for the ith factor level is denoted by Y;.: 


2 Y. =) Yy (16.11) 
* i 
х Note that the dot in Y;. indicates an aggregation over the j index; in our example, the 


aggregation is over all stores assigned to the ith package design. For instance, the total 
sales for all stores assigned package design 1 are, according to Table 16.1, Y;. = 73 cases. 
Similarly, total sales for all stores assigned package design 4 are Y4. — 136 cases. 

The sample mean for the ith factor level is denofed by Y;.: 


Waden лї» 6 


ni 
ү. = >u Y. (16.12) 
ni ni 
In our example, the mean number of cases sold by stores assigned package design 1 is 
Y,. = 73/5 = 14.6. Note that the dot in the subscript Y,. indicates that the averaging is 
done over j (stores). 
The total of all observations in the study is denoted by У.: 


r nj 


Y.= уу» (16.13) 


i=) j-l 


where the two dots indicate aggregation over both the j and i indexes (in our example, over 
all stores for any one package design and then over all package designs). In our example, 
the total sales for all stores for all designs are Y.. = 354. 

Finally, the overall mean for all responses is denoted by Y..: 


Y УУ X 


nT nT 


(16.14) 


The two dots here indicate that the averaging is done over both i and j. For our example, 
we have from Table 16.1 that Y.. — 354/19 — 18.63. Note that the overall mean (16.14) 
can be written as a weighted average of the factor level means in (16.12): 


Y. =) —Y, (16.14a) 


Least Squares and Maximum Likelihood Estimators 


According tothe least squares criterion, the sum of the squared deviations of the observations 
around their expected values must be minimized with respect to the parameters. For ANOVA 
model (16.2), we know from (16.3) that the expected value of observation Y;; is Е{Ү,,} = pi- 
Hence, the quantity to be minimized is: 


о= УУ -ш) (16.15) 
i j 
Now (16.15) can be written as follows: 


Q-—M Qm, - ш) MO - uy OS — wy "(16.15a) 
1 j j 


688 Ра Four 


Example 


Design and Analysis of Single-Fuctor Studies 


Note that each of the parameters appears in only one of the component sums jn (16.15 

Hence, О сап be minimized by minimizing each of the component sums separately Fs 

well known that the sample mean minimizes a sum of squared deviations. Hence the le s 
7 ast 


squares estimator of 1t, denoted by £i, is: 


Bro de (16.16) 
Thus, the fitted value for observation Y;;, denoted by Y for regression models, is simpl 
the corresponding factor level sample mean here: y 


Y; = Yj. (16.17) 
The same estimators are obtained by the method of maximum likelihood. The likelihood 


function here corresponds to that in (1.26) for the normal error simple linear regression 
model, except that the regression model expected value By + В X; is replaced here буш: 


3 1 1 A 
L(p, es Mr, O ) = (2zo2y? exp 2g? 2.2.0 ER ш) (16.18) 


Maximizing this likelihood function with respect to the parameters u; is equivalent to 
minimizing the sum >> (1; — ш)? in the exponent, which is the least squares criterion 
in (16.15). 


For the Kenton Food Company example, the least squares and maximum likelihood esti- 
mates of the model parameters are as follows according to Table 16.1: 


Parameter Estimate 
n йл = ћ. = 14.6 
H2 йл = Yo. = 13.4 
Из Ёз = Үз. = 19.5 
HA Ёа = Үз. = 27.2 


Thus, the mean sales рег store with package design 1 are estimated to be 14.6 cases for 
the population of stores under study, and the fitted value for each of the observations for 
package design | is Îi; - Y = 14.6. Similarly, the mean sales for package design 2 are 
estimated to be 13.4 cases per store, and the fitted values for each response for&his package 
design is Ў, = Ӯ. = 13.4. 


Comments 


l. The least squares and maximum likelihood estimators in (16.16) have all of the desirable 
properties mentioned in Chapter | for the regression estimators. For example, they are minimum 
variance unbiased estimators. 

2. To derive the least squares estimator of иу, we need to minimize, with respect to 44, the ith 
component sum of squares in (16. 15a): 


Qi = У, =ч ш)” (1619) 
i 


esiduals 


‘Example 


TABLE 16.2 
Residuals— 
Kenton Food 
Company 
Example. 


Chapter 16  Single-Factor Studies 689 


Differentiating with respect to 44, we obtain: 


аб _ usua n. 
ды 2. 2(Y — ш) 


When we set this derivative'equal to zero and replace the parameter ш by the least squares estimator 
f, we obtain the result in (16.16): 


ni 


-2X 0 — й) = 0 
j=l 


у, Yi =m hy 
J 


AI 


py = 


Residuals are highly useful for examining the aptness of ANOVA models. The residual e;; 
is again defined, as for regression models, as the difference between the observed and fitted 
values: 


ey = Y; - Y = үу - Yi (16.20) 


Thus, a residual here represents the deviation of an observation from its estimated factor 
level mean. 

An important property of the residuals for ANOVA model (16.2) is that they sum to zero 
for each factor level i: 


У ej=0 = 7 (16.21) 

J 
As for regression analysis, residuals for ANOVA models are useful for examining the 
appropriateness of the ANOVA model. We shall discuss this use of residuals in Chapter 18. 


Table 16.2 contains the residuals for the Kenton Food Company example. For instance, 
from Table 16.1, we find: 


én = Yi — Y, = 11 — 14.6 = —3.6 
ед = Yo, — Yo. = 12 — 13.4 = —14 


Note from Table 16.2 that the residuals sum to zero for each factor level, as expected. 


Package Design - Store (/) - —— 
i 1 2 3 4 5 Total 
1 —3.6 2.4 1.4 —.6 A 0 
2 -14 -34 16 $6 -24 0 
3 3.5, 5 -1S -ZS 0 
4 —.2 58 ŻS.2 —1.2 8 0 
All designs 0 


690 Part Four 


Design and Analysis of Single- Factor Studies 


16.5 Analysis of Variance 


Just as the analysis of variance for a regression model partitions the total sum of Squares into 
the regression sum of squares and the error sum of squares, so a corresponding partitionin 
exists for ANOVA model (16.2). £ 


Partitioning of SSTO 


The total variability of the Y;; observations, not using any information about factor levels 
is measured in terms of the total deviation of each observation, i.e., the deviation of Y, 
around the overall mean Y..: 


Y; — Y. (16.22) 


When we utilize information about the factor levels, the deviations reflecting the uncertainty 
remaining in the data are those of each observation Y;; around its respective estimated factor 
level mean Y;.: 


y; — Y. (16.23) 


The difference between the deviations (16.22) and (16.23) reflects the difference between 
the estimated factor level mean and the overall mean: 


Oy — Y.) - (Y — Y.) = Y. — Y. (16.24) 


Note from (16.24) that we can decompose the total deviation Y; — Y.. into two compo- 
nents: 


Vea ho So Б. deer (16.25) 
—— — —— 
Total Deviation of Deviation 
deviation estimated around 
factor level estimated 
mean around factor 
overall mean level mean 


Thus, the total deviation Y;; — Y.. can be viewed as the sum of two components: 


1. The deviation of the estimated factor level mean around the overall mean. 
2. The deviation of Y;; around its estimated factor level mean, which is simply the residual 
ец according to (16.20). 


Figure 16.4 illustrates this decomposition for the Kenton Food Company example for two 
of the observations, У and Үд5. 

When we square both sides in (16.25) and then sum, the cross products on the right drop 
out and we obtain: 


di, — K.P = ah. - KY + VO — Fy? (1626) 
1 i i i i 


The term on the left measures the total variability of the Y;; observations and is denoted, a 


i Chapter 16 — Single-Factor Studies 691 


6 RE 16.4 Illustration of Partitioning of Total Deviations Y;; — Y..—Kenton Food Company Example (not 
eiited to scale; only observations Y;; and Y,5 are shown). 
gite 


=) Total Deviations Y; — Ү.. <° (b) Deviations Y; — Ӯ. (c) Deviations №. — Y.. 


for regression, by SSTO for total sum of squares: 


SSTO = 3 у Yy – Ү.)? (16.27) 
p 


The first term on the right in (16.26) will be denoted by SSTR, standing for treatment 
sum of squares: 


SSTR = Y ni; - ¥..)? (16.28) 


i 


The second term on the right in (16.26) will be denoted by SSE, standing for error sum of 


squares: 
5ЅЕ = M Su - 1.) =D у е, (16.29) 
i j i j 


Thus, (16.26) can be written equivalently: 
SSTO = SSTR + SSE (16.30) 


The correspondence to the regression decomposition in (2.50) is readily apparent. 
The total sum of squares for the analysis of variance model is therefore made up of these 
two components: 


1. SSE: A measure of the random variation of the observations around the respective 
estimated factor level means. The less variation among the observations for each factor 
level, the smaller is SSE. If SSE = 0, the observations for any given factor level are all the 
same, and this holds for all factor levels. The more the observations for each factor level 
differ among themselves, the larger will be SSE. 

2. SSTR: A measure of the extent of differences between the estimated factor level means, 
based on the deviations of the estimated factor level means Y;. around the overall mean Y... 
If all estimated factor level means Y;. are the same, then SSTR — 0. The more the estimated 
factor level means differ, the larger will be SSTR. "E 


692 PartFOur Design and Analysis of Single-Factor Studies 


The analysis of variance breakdown of the total sum of squares for the Kenton Food 


Example | : : ; 
————————_ pany example in Table 16.1 is obtained as follows, using (16.27), (16.28), and (1629. 


SSTO = (11 — 18.63)? + (17 — 18.63)? + (16 — 18.63)? +... + (28 — 18.63)? 


= 746.42 
SSTR = 5(14.6 — 18.63)? + 5(13.4 — 18.63)? + 4(19.5 — 18.63)? + 5(27.2 — 18.63)? 
= 588.22 
SSE = (11 — 14.6)? + (17 — 14.6)? + (16 — 14.6)? +::: + (28 — 27.29? 
= 158.20 


Thus, the decomposition of SSTO is: 


746.42 = 588.22 + 158.20 
SSTO = SSTR + SSE 


Note that much of the total variation in the observations is associated with variation between 
the estimated factor level means. 


Comments 
I. To prove (16.26), we begin by considering (16.25): 
Yy — Y. = (X. - Y.) + (Yy; — Y.) 
Squaring both sides we obtain: 
Qi; = Y! = (V. У.) + Qj - Y.) + 20. — Y); — Y) 
When we sum over all sample observations in the study (ї.е., over both i and /), we obtain: 


УУ-У у i - Y € 0 Y 4 $208 - 30 - 19 
/ i J io d i j 


| (16.31) 


The first term on the right in (16.31) equals: 


| УУ-У. = Уа. –- У) (16.32) 
1 


i i 


since (У. — Y..)? is constant when summed over j; hence, лу such terms are picked up for the 
summation over j. s 
The third term on the right in (16.31) equals zero: 


УУ 2. -Y)(Yy-X*)- 25°. —Ё.) УО, -YX)-0 (1633) 
1 i 1 


i 


This follows because У. — Y.. is constant for the summation over j; hence, it can be brought in front 
of the summation sign over j. Further, 2508 -— Y.) = 0 for all i, since the sum of the deviations 
around the arithmetic mean is always Zero. 

Thus, (16.31) reduces to (16.26). 


Chapter 16 — Single-Factor Studies 693 


2. The squared estimated factor level mean deviations ( X. — Y..)? in SSTR in (16.28) are weighted 
by the number of cases т, for that factor level. The reason is that for each observation Y;, at factor 


level i, the deviation component ү. — Y. is ће same. a 


Kdown of Degrees of Freedom 
is Corresponding to the decomposition of the total sum of squares, we can also obtain a 
breakdown of the associated degrees of freedom. 
SSTO has пт — 1 degrees of freedom associated with it. There are altogether пт deviations 
* Y; — Y.., but one degree of freedom is lost because the deviations are not independent in 
E that they must sum to zero; i.e., $^ > (Y; — Y.) = 0. 

SSTR has r — 1 degrees of freedom associated with it. There are r estimated factor level 
$ mean deviations Y;. — Y.., but one degree of freedom is lost because the deviations are not 
Е independent in that ће weighted sum must equal zero; i.e., Y n;(Yj. — Y.) = 0. 

SSE has пт — r degrees of freedom associated with it. This can be readily seen by 
considering the component of SSE for the ith factor level: 


. DI ш (16.34) 


The expression in (16.34) is the equivalent of a total sum of squares considering only the 
‘ ith factor level. Hence, there are n; — 1 degrees of freedom associated with this sum of 

squares. Since SSE is a sum of component sums of squares such as the one in (16.34), the 
i degrees of freedom associated with SSE are the sum of the component degrees of freedom: 


(п — D + (0 = 1) cQ 1) пг (16.35) 


For the Kenton Food Company example, for which пт = 19 ара r = 4, the degrees of 
n freedom associated with the three sums of squares are as follows: 
$5 df 
SSTO 19-1=18 
SSTR 4-1= 
SSE 19—4= 15 


Note that degrees of freedom, like sums of squares, are additive: 
18 —3 + 15 


Mean Squares 
The mean squares, as usual, are obtained by dividing each sum of squares by its associated 
degrees of freedom. We therefore have: 


SSTR 
MSTR — E 7 (16.36a) 
F— 


SSE 


Ит —Fr 


MSE — 


(16.36b) 


694 PartFour Design and Analysis of Single-Factor Studies 


Here, MSTR stands for treatment mean square and MSE, as before, stands for error me 
ап 


square. 
Example For the Kenton Food Company example, we obtain from earlier results: 
588.22 
MSTR — — 196.07 
158.20 
MSE — 15 = 10.55 


Note that the two mean squares do not add to SS7O/(ny — 1) = 746.42/18 4147 
Thus, the mean squares here, as in regression, are not additive. | 


Analysis of Variance Table 

The breakdowns of the total sum of squares and degrees of freedom, together with the 
resulting mean squares, are presented in an ANOVA table such as Table 16.3. The ANOVA 
table for the Kenton Food Company example is presented in Figure 16.5 which contains the 
JMP output for single-factor analysis of variance. Note that the output contains the Overall 
mean response (Y — 18.63158), the number of observations, the ANOVA table, and the 
estimated factor level means Y; . In this table, the line for the treatments source of variation is 
labeled “Package Design." The results in the JMP output are shown to more decimal places 
than wehaveshown, but are consistent with our calculations. Notealso that the J[MP ANOVA 
table shows the degrees of freedom column before the sum of squares column. The columns 
labeled “Std Error," "Lower 95%,” and "Upper 9596" will be discussed in Chapter 17. 


Expected Mean Squares 


The expected values of MSE and MSTR can be shown to be as follows: 


E{MSE} = o? (16.372) 
E(MSTR} = о? + 22 аса (16.376) 
m 
where: 
PEUT (16.37c) 


пт 
is referred to as the weighted mean. These expected values are shown in the E {MS} column 
of Table 16.3. 


TABLE 16.3 ANOVA Table for Single-Factor Study. 


Source of 
Variation SS df MS E(MS] 
=e = ni io py 

Between SSTR = Y` ni(Y.. — Ү..)° r-1 Ме 218 о? + Lai 

treatments neg fs 

= E 

Error (within SSE= Уу» (И — Ї.)° n =r MSE = 35 c? 

treatments) КА 


Total SSTO = УУ (y — Y? пт —1 


GÜRE 16-5 
honof JMP 


Chapter 16 Single-Factor Studies 695 


Oneway Anova 


Summary of Fit 
Rsquare 0.788055 
Adj Rsquare 0.745666 
Root Mean Square Error 3.247563 
Mean of Response 18.63158 
Observations (or Sum Wgts) 19 


Analysis of Variance 


Source DF Sum of Squares Mean Square F Ratio Prob > Е 
Package Design 3 588.22105 196.074 18.5911 «.0001 
Error 15 158.20000 10.547 

C. Total 18 746.42105 


Means for Oneway Anova 


Level | Number Mean Std Error Lower 95% Upper 95% 
1 5 14.6000 1.4524 11.504 17.696 
2 5 13.4000 1.4524 10.304 16.496 
3 4 19.5000 1.6238 16.039 22.961 
4 5 27.2000 1.4524 24.104 30.296 


Std Error uses a pooled estimate of error variance 


(а) ш = рә = из = иа = Me (b) ш; Not Equal 


Hc mam n» из 


Two important features of the expected mean squares deserve attention: 


1. MSE is an unbiased estimator of c?, the variance of the error terms e; j» Whether or 
not the factor level means u; are equal. This is intuitively reasonable since the variability of 
the observations within each factor level is not affected by the magnitudes of the estimated 
factor level means for normal populations. 


2. When all factor level means и; are equal and hence equal to the weighted mean u., then 
E{MSTR} = о? since the second term on the right in (16.37b) becomes zero. Hence, MSTR 
and MSE both estimate the error variance с? when all factor level means и; are equal. When, 
however, the factor level means are not equal, MSTR tends on the average to be larger than 
MSE, since the second term in (16.37b) will then be positive. This is intuitively reasonable, 
as illustrated in Figure 16.6 for four treatments. The situation portrayed there assumes 
that all sample sizes are equal, i.e., n; =n. When all и; are equal, then all Y;. follow the 
same sampling distribution, with common mean 44, and variance c?/n; this is portrayed in 


696 PartFour Design and Analysis of Single- Factor Studies 


Figure 16.6a. When the и; are not equal, on the other hand, the Y;. follow di fferent samplin, 
distributions, each with the same variability o7/n but centered on different means Hi. One 
such possibility is shown in Figure 16.6b. Hence, the Y;. will tend to differ more from e 
other when the u; differ than when the и, are equal, and consequently MSTR Will tend 
to be larger when the factor level means are not the same than when they are equal. This 
property of MSTR is utilized in constructing the statistical test discussed in the next Section 
to determine whether or not the factor level means ш; are the same. If MSTR and MSE 
are of the same order of magnitude, this is taken to suggest that the factor level Means py, 
are equal. If MSTR is substantially larger than MSE, this is taken to suggest that the Li; ate 
not equal. 


Comments 


l. To find the expected value of MSE. we first note that MSE can be expressed as follows: 


MSE = — >» он – Үү.) 
i d 


1 »» = xj 
uero d |o yg (16.38) 


Now let us denote the ordinary sample variance of the observations for the ith factor level by s2: 


y d hy 
52 = 2.00 7 Tey (16.39) 


ё пі = 1 


Hence, (16.38) can be expressed as follows: 


1 
MSE = У (nj — 1)52 (16.40) 
Hp ғ == 
E 
Since it is well known that the sample variance (16.39) is an unbiased estimator of the population 
variance, which in our case is o? for all factor levels, we obtain: 


= Уин 


1 Уу; 2 
= (ni — lor 
пр ғ Ы 


i 


E{MSE} = 


2 


= 0 


2. We shall derive the expected value of MSTR for the special case when all sample sizes п; arè 
the same. namely. when л, = n. The general result in (16.37b) becomes for this special case: 


E|MSTR) = o? + ” » uu Ut JE "aditam (16.41) 
r— 


Further, when all factor level sample sizes are n. MSTR as defined in (16.28) and (16.362) becomes: 


n Xx. -Y.P 


r—1 


MSTR = when rn; = n (16.42) 


Chapter 16 — Single-Factor Studies 697 


To derive (16.41), consider the model formulation for Y;; in (16.2): 
Ү = ш + &y 
t 
Averaging the‘ Y;; for the ith factor level, we obtain: 
Y. = pi t. 
where £j. is the average of the ғ; for the ith factor level: 


= ёр 
6. = =—— 
п 


Averaging (ће Y;; over all factor levels, we obtain: 
Y. = ш. +E.. 


where u., which is defined in (16.37c), becomes for ғ; = n: 


me nY- ha — Уш 
r 


nr 


when n; =n 


and £.. is the average of all £j: 


25 781 


E. £M 


nr 
Since the sample sizes are equal, we also have: 


Y.. = Ex Е. 228 


r 


Using (16.43) and (16.45), we obtain: 


Y. — Y. = (pj + Er.) — (и.  £-) = (и; — и.) + (8. — E.) 


When we square Y,. — Y.. and sum over the factor levels, we obtain: 


ge Y = Уши) + у Er – 8.) +2 У (ш — n8 — &-) 


(16.43) 


(16.44) 


(16.4S) 


(16.46) 


(16.47) 


(16.48) 


(16.49) 


(16.50) 


We now wish to find E(Y (Y;. — Y..)?), and therefore need to find the expected value of each term 


on the right in (16.50): 
а. Since Уи: — p. Y. is a constant, its expectation is: 


{У п) = Su ng. 


(16.51) 


b, Before finding the expectation of the second term on the right, consider first the expression: 


SG. – E.) 


r—1 


This is an ordinary sample variance, since £.. is the sample mean of the r terms £;. per (16.48). 
We further know that the sample variance is an unbiased estimator of the variance of the 
variable, in this case the variable being £;.. But £;. is just the mean of n independent error terms 


в by (16.44). Hence: 
= o^(&;) o? 
с?{;.} — БЕА = du: 


698 Part Four Design and Analysis of Single-Factor Studies 


ef y» — = " о? 
ғ 1 n 


а (r — Do? 
23 Er} = i п : (165 


c. Since both £;. and &.. are means of €;; terms. all of which have expectation 0, it follows that: 


* 


Therefore: 


so that: 


Ef{g;.} = 0 Ef{e..} = 0 
Непсе: 


Е{2 Уш = и.) ite 25 "uu -мЕ{&. – E} =0 (16.53). 


We have thus shown, by (16.51), (16.52), and (16.53), that: 


ку. к?р =u е: 


But then (16.41) follows at once: 


мат = ef "ЁЁ? | cn 7 [Dar +0207 
а пуши.) 


= when n; = п 


r-i ГА 


16.6 Е Test for Equality of Factor Level Meaus 


It is customary to begin the analysis of a single-factor study by determining whether or not 
the factor level means и; are equal. If, for instance, the four package designs in the Kenton 
Food Company example lead to the same mean sales volumes, there is no need for further 
, analysis, such as to determine which design is best or how two particular designs compare 
in stimulating sales. 
Thus, the alternative conclusions we wísh to consider are: 


H, S = о =... = г 
0: Mi = И? H (16.54) 
Ha: not all и; are equal 
Test Statistic 
The test statistic to be used for choosing between the alternatives in (16.54) is: 
sa Moin (16.55) 
MSE 


Note that MSTR here plays the role corresponding to MSR for a regression model. 

Large values of F* support H,, since MSTR will tend to exceed MSE when H, holds, 85 
we saw from (16.37). Values of F* near 1 support Ho, since both MSTR and MSE have the 
same expected value when Hy holds. Hence, the appropriate test is an upper-tail one. 


{ dition 


* 


E 


i 


Chapter 16 — Single-Factor Studies 699 


of F* 

When all treatment means и; are equal, each response Y;; has the same expected value. In 
view of the additivity.of sums of squares and degrees of freedom, Cochran's theorem (2.61) 
then implies: 


SSE SSTR 
When Ho holds, —- and —— are independent x? variables 
c c 


It follows in the same fashion as for regression: 
When Ho holds, F* is distributed as F(r — 1, пт — r) 


When H, holds, that is, when the и; are not all equal, F* does not follow the F distri- 
bution. Rather, it follows a complex distribution called the noncentral F distribution. We 
shall make use of the noncentral F distribution when we discuss the power of the F test in 
Section 16.10. 


Comment 


SSTR and SSE are independent even if all и; are not equal. SSTR is solely based on the estimated factor 
level means Y;.. On the other hand, SSE reflects the variability within the factor level samples, and 
this within-sample variability is not affected by the magnitudes of the estimated factor level means 
when the error terms are normally distributed. и 


séiistruction of Decision Rule 


Example 


Usually, the risk of making a Type I error is controlled in constructing the decision rule. 
This provides protection against making further, more detailed, analyses of the factor effects 
when in fact there are no differences in the factor level means. The Type П error can also 
be controlled, as we shall see later in Section 16.10, through sample size determination. 

Since we know that F* is distributed as F(r — 1, nr — r) when Ho holds and that large 
values of F* lead to conclusion H,, the appropriate decision rule to control the level of 
significance at o is: 


If F* < F(1—o;r — 1, пт — r), conclude Hy 
(16.56) 
If F* > F(1—o;r — 1, пт — r), conclude H, 


where F(1 —o;r—1,nrz —r)isthe (1 — 0)100 percentile of the appropriate F distribution. 


For the Kenton Food Company example, we wish to test whether or not mean sales are the 
same for the four package designs: 
Ho: pa = ua = из = pMa 
Ha: not all u; are equal 
Management wishes to control the risk of making a Type I error ata = .05. We therefore 
require F(.95; 3, 15), where the degrees of freedom are those shown in Figure 16.5. From 
Table B.4 in Appendix B, we find F(.95; 3, 15) = 3.29. Hence, the decision rule is: 
If F* < 3.29, conclude Ho 
If F* > 3.29, conclude H, 


700 PartFour Design and Analysis of Siugle-Factor Studies 


Using the data in the ANOVA table ín Figure 16.5, we obtain the test statistic: 


| МТК 196.07 
^ MSE 10.55 


* 


= 18.6 


Since F” = 18.6 > 3.29, we conclude H,, that the factor level means p; are not equal, or 
that the four different package designs do not lead to the same mean sales volume, Thus, 
we conclude that there is a relation between package design and sales volume, 

The P-value for the test statistic is the probability P{F (3, 15) > F* = 18.6}, which is 
.00003. This P-value again indicates that the data from the experiment are not Consistent 
with all designs having the same effect on sales volume. 

The conclusion of a relation between package design and sales volume did not Surprise 
the sales manager of the Kenton Food Company. The study was conducted in the first place 
because the sales manager expected the four package designs to have different effects on 
sales volume and was interested in finding out the nature of these differences, In the next 
chapter, we discuss the second stage of the analysis, namely, how to study the nature of the 
factor level means when differences exist. 


Comments 


l. If there are only two factor levels so that r = 2. it can easily be shown that the test employing 
F* in (16.55) is the equivalent of the two-population, two-sided г test in Table A.2a. The F test here 
has (1. пу — 2) degrees of freedom. and (ће г test has лу +72 — 2 or ny — 2 degrees of freedom; thus 
both tests lead to equivalent critical regions. For comparing two population means. the f test generally 
is to be preferred since it can be used to conduct both two-sided and one-sided tests (Table A.2); the 
F test can be used only for two-sided tests. 


2. Since the F test for testing the alternatives (16.54) is a test of a linear statistical model, it can 
be obtained by the general linear test approach explained in Section 2.8: 


a. The full model is ANOVA model (16.2): 
Y; = ш t є; Full model (16.57) 


Fitting the full model by either the method of least squares or the method of maximum likelihood 
leads to the fitted values Y; = Y;., per (16.17). and to the resulting error sum of squares: 


SSE) = 9 Nau ~ Ea = No - Ey . 


SSE(F) has dfe —nr — r degrees of freedom associated with it because ғ parameter values 
(7 TP Hr) have to be estimated. 
b. The reduced model under Hp is: 


Ү = Het £j Reduced model (16.58) 


where He is the common mean for all factor levels. Fitting the reduced model leads to the 
estimator ft, = Y.. so that all fitted values are ¥;; = Y... and the resulting error sum of squares 


is: 
SSE(R) = у S oO - Ya = у у au vy 


Chapter 16  Single-Factor Studies 701 


The degrees of freedom associated with SSE(R) are df, = nr — 1 because one parameter (pe) 
4 had to be estimated. 
i c. Since, according to (16.27) and (16.29), respectively: 


i SSE(R) — SSTO 
SSE(F) — SSE 
and since by (16.30) SSTO — SSE — SSTR, the general linear test statistic (2.70) becomes here: 
SSE(R)— SSE(F) | SSE(F) 


: Е* = 
: dfr — dfr dfr 
i SSTO— SSE SSE SSIR SSE МТК 

~ (щи—1)—(и—) np-r r-1 np-r MSE a 


67 Alternative Formulation of Model 


H 
сог Effects Model 
` At times, an alternative but completely equivalent formulation of the single-factor ANOVA 
model in (16.2) is used. This alternative formulation is called the factor effects model. With 
this alternative formulation, the treatment means и; are expressed in an equivalent fashion 
by means of the identity: 


ш = e. + Qu — џи.) (16.59) 


where u. is a constant that can be defined to fit the purpose of the study. We shall denote 
the difference u; — u. Бу т;: 


Ti = Mi — p. (16.60) 
so that (16.59) can be expressed in equivalent fashion as: 
Hi =p. + Uu (16.61) 


The difference т; = u; — и. is called the ith factor level effect or the ith treatment effect. 
The ANOVA model in (16.2) can now be stated equivalently as follows: 


Y; = Me + G + Eij (16.62) 
where: 
к. is a constant component common to all observations 
1; is the effect of the ith factor level (a constant for each factor level) 
&;; are independent N (0, c?) 
i-—1l,..nj-—l...ni 
ANOVA model (16.62) is called a factor effects model because it is expressed in terms of 


the factor effects т;, in distinction to the cell means model (16.2), which is Өчө in 
terms of the cell (treatment) means us. 


702 PartFour Design and Analysis of Single-Factor Studies 


Factor effects model (16.62) is a linear model, like the equivalent cell means mode] 


(16, 
We shall demonstrate this in the next section. 2. 


Definition of p. 


Example 


The splitting up of the factor level mean и; into two components, an overall constant И. апі 
а factor level or treatment effect т;, depends оп the definition of џ., which can be defined 
in many ways. We now explain two basic ways to define x.. 


Unweighted Mean. Often, a definition of jz. as the unweighted average of all factor level 
means 44 15 found to be useful: 


ER Уо Hi 
je ced (16.63) 


This definition implies that: 


SLE (16.64) 
#=1 
because by (16.60) we have: 


Ут = Уш-и) =X ш — ги. 


and by (16.63) we have: 


Уш = ѓи. 


Thus, the definition of the overall constant д. іп (16.63) implies a restriction on the т;, in 
this case that their sum must be zero. 


For the earlier incentive pay example in Figure 16.2, we have u, = 70, u2 = 58, ua = 90, 
and ua = 84. When д. is defined according to (16.63), we obtain: 


p= I3 38E90* = | 
Hence: 
тр = 70 — 75.5 = —5.5 
T) = 58 — 75.5 = — 17.5 
тз = 90 — 75.5 = 14.5 
u == 84 — 75.5 = 8.5 
The first weatment effect тү = —5.5, for instance, indicates that ће mean employee pro 


ductivity for incentive pay type 1 is 5.5 units less than the average productivity for all four 
types of incentive pay. Figure 16.7 provides an illustration of these treatment effects. 


Example 1 


Chapter 16  Single-Factor Studies 703 


Y 


E 
N 


ped 


u. = 75.5 


Weighted Mean Тһе constant u. can also be defined as some weighted average of the 
factor level means p: 


ш. = 2 чин where Уш = 1 (16.65) 
i= i=l 


Note that the w; are weights defined so that their sumis 1. The restriction on the т; implied 
by definition (16.65) is: 


А 
do wit =0 (16.66) 
i= 

This follows in the same fashion as (16.64). 

The choice of weights w; should depend on the meaningfulness of the resulting over- 
all mean u.. We present now two examples where different weightings are appropriate: 
(1) weighting according to a known measure of importance and (2) weighting according to 
sample size. 


А car rental firm wanted to estimate the average fuel consumption (in miles per gallon) 
for its large fleet of cars, which consists of 50 percent compacts, 30 percent sedans, and 
20 percent station wagons. Here, a meaningful measure of u. might be in terms of overall 
mean fuel consumption: 


№. = Spy + Зирә + 23 (16.67) 


where ш, 142, and из are the mean fuel consumptions for the three types of cars in the fleet. 
An estimate of u. here is: 


р. == .5Ү,. + ЗҮ. + 2Y4. (16.68) 


When exact weights are unknown, the subgroup sample sizes may be useful as weights of 
relative importance. For instance, the proportions of households in a city with no children, 
one child, and more than one child are not known. A random sample of n7 heuseholds was 


704 PartFour Design and Analysis of Single- Factor Studies 


selected, which contained лп households with no child, л> households with one Child, and n, 
households with more than one child. For testing whether mean entertainment expenditures 
are the same for the three types of households, use of the proportions n; /n7, m/nr, ang 
пз/пт as weights might be meaningful. The resulting definition of the overall entertainment 
expenditures constant и. would then be: 


H] n na + ni 
|. = 2 
1 пт P пт P ny А (16.69) 
This quantity would be estimated Бу Y..: 
" ny Y 4 пэ y " na Y Y 
|i. = . 2. .cc {.. 
і атат (16.70) 


When all sample sizes are equal, џи. as defined in (16.69) reduces to the un weighted 
mean (16.63). 


Test for Equality of Factor Level Means 
Since the factor effects model (16.62) is equivalent to the cell means model (16.2), the test 
for equality of factor level means uses the same test statistic F* in (16.55). The only dif. 
ference is in the statement of the alternatives. For the cell means model (16.2), the alternatives 
are as specified in (16.54): 


Ho: fy = Ham = ц, 
H,: not all и; are equal 


For the factor effects model (16.62), these same alternatives in terms of the factor effects 
are: 
Ну: тү = т = +++ = т, = 0 


(16.71) 


Н,: not all т; equal zero 


The equivalence of the two forms can be readily established. The equality of the factor 
level means 4 = и» = --- = Hr implies that all т, are equal. The equalities of ће т 
follow from (16.61) since the constant term z. is common to all factor level effects т;. The 
equality of the factor level means in turn implies that all т, = 0, whether the restriction on 
the т; is of the form in (16.64) or (16.66). In either case, the restriction сип be satisfied in 
only one way given the equality of the t;. namely, that т; = 0. Thus, it js equivalent to state 
that all factor level means и, are equal or that all factor level effects v; equal zero. 


16.8 Regression Approach to Single-Factor Analysis of Variance 


We noted earlier that cell means model (16.2) is a linear model, and that we can obtain lest 
statistic F* for testing the equality of the factor level means и, by means of the general 
linear test (2.70). We shall now explain the regression approach to single-factor analysis of 
variance for three alternative models: (1) the factor effects model with unweighted mean, 
(2) the factor effects model with weighted mean, and (3) the cell means model. It is important 
to emphasize that the choice of model affects the definition of the model parameters, and 
not the outcome of the test for equality of factor level means. 


È 
> 
M 


E 


Chapter 16 Single-Factor Studies 705 


Zr Effects Model with Unweighted Mean 


To state ANOVA model (16.62): 
Yj = ш. + G + 


as a regression model, we need to represent the parameters p., Ti, ..., t, in the model. 
However, constraint (16.64) for the case of equal weightings: 


Укей 


i=l 


implies that one of the r parameters т; is not needed since it can be expressed in terms of 
the other r — 1 parameters. We shall drop the parameter v, which according to constraint 
(16.64) can be expressed in terms of the other r — 1 parameters т; as follows: 


T = р 9 555 0.4 (16.72) 


Thus, we shall use опу the parameters 44., ту,..., т. for ће linear model. 

To illustrate how a linear model is developed with this approach, consider a single-factor 
study with r = 3 factor levels when n, = n2 = пз = 2. The Y, X, f, and e matrices for 
this case are as follows: 


Yu 1 1 0 Ell 
Yi; 1 1 0 u E12 
m Yo, ЯА 1 0 1 = j __ JEn 
Y-|y кер, ве Б (16.73) 
Ya 1-1 -1 в єз 
Yi; 1—1 -1 £32 


Note that the vector of expected values, E(Y) — Xf, yields the following: 


E{Yi1} 1 1 0 H. + 
Е{Ү|ә} 1 1 0 ii ш. cu 
_ EiY4) £ EE 1 0 1 ` к. + 
Е(Ү} = E{Yoo} | В = 1 0 1 E mi ipe (16.74) 
E(Ya) 1—1 -1 е p.—7—7 
E(Ya;) 1 -1 -I Ш. — T — Т 


Since тз = —T, — тә according to (16.72), we see that E{Y3,} = E{Y3.} = и. +73. Thus, the 
above X matrix and В vector representation provides in all cases the appropriate expected 
values: 


E(Yij] = и. + 


The illustration in (16.73) indicates how we need to define in general the multiple 
regression model so that it is the equivalent of the single-factor ANOVA model (16.62). 
Note that we require indicator variables that take on values 0, 1, or —1. This. coding was 
discussed in Section 8.1. While this coding is not as simple as a 0, 1 coding, it is desirable 


706 PartFour Design and Analysis of Single-Factor Studies 


here because it leads to regression coefficients in the В vector that are the parameters in the; 
factor effects ANOVA model, i.e, 4., t ..., t, i. e 
We shall let X;;, denote the value of indicator variable X, for the jth case from the ith: 
factor level, X;j» the value of indicator variable X» for this same case, and so On, usin 3 
altogether r — I indicator variables in the model. The multiple regression mode] then į | 


Е S as; 
follows: 7 


Yj; = џи. + LP. CT + т Ху; Ter Trí Xija-1 + Ej; Full model (16.75) | 
where: 


| if case from factor level I 
Ха = 4 —1 if case from factor level r 
O otherwise 


I if case from factor level r — 1 
Xijr-1 4 —1. if case from factor level r 
O otherwise 


Note how the ANOVA model parameters play the role of regression function parameters 
in (16.75); the intercept term is x., and the regression coefficients аге ту, T2, ..., t... 
The least squares estimator of u. is the average of the cell sample means: 
S 

E ar 

Ё. = Led (16.75a) 
Note that this quantity is generally not the same as the overall mean Y.. unless the cell 
sample sizes are equal. Also, the least squares estimator of the ith factor effect is: 


f =} — f. (16.75b) 


‘To test the equality of the treatment means jz; by means of the regression approach, we 
state the alternatives in the equivalent formulation (16.71), noting that т. must equal zero 
when t, = t» =... = т. = 0 according to (16.72): 


Ho: t т =. S T, 0 
NET | (16.76) 
Ни: not all т, equal zero а 


Note that Ho states that all regression coefficients іп regression model (16.75) are zero, and 
the reduced model is therefore: 


Yi = p. + &ij Reduced model (16.77) 


Thus, we employ the usual test statistic (6.39b) for testing whether or not there isa regression 
relation: 
pana (16.78) 
MSE 


Example To test the equality of mean sales for the four cereal package designs in the Kenton Food 
— ——  —— — Company example by means of the regression approach, we shall employ the regression 


Chapter 16 — Single-Factor Studies 707 


model: 
Yi =u. + Uu Xi T ToXip + T3 Xija teij (16.79) 
where: 


if case from factor level 1 
Xij = 4 —1 if case from factor level 4 
O otherwise 


if case from factor level 2 
Xi = 4-—1 if case from factor level 4 
O otherwise 


if case from factor level 3 
Хз = 4 —1 if case from factor level 4 
otherwise 


A portion of the data in Table 16.1 is repeated in Table 16.4a, together with the coding of 
the indicator variables Х|, X2, and Хз. For observation Y, for instance, note that X, = 1, 
X, = 0, and Хз = 0; hence, we obtain from (16.79): 


E(Yu) = и. +T 


im 157 J (a) Data for Regressior n ‘Mode i 16.7 ) x: 
Approach to Sd ] Yj Xii 
the Analysis of 7 1. 11 1 
Сору 1 4 14 T 
Example. 1 5 15 1 
2 1 12 0. 


‘Variation SS. 


Regression SSR:=:588:22 3 
Error SSE.— 158.20 15. 


Total — — . SSTO = 74642 18 


708 


Part Four 


Design and Analysis of Siigle- Factor Studies 
Similarly, for observation Уз we have X, = —1, X» = —1, and Хз = — I; hence- 


ЕҮ5} = р. =т=т = и. Б 


since ту = — Тр — Т» — Т3. 
Note that we employ the lollowing codings in the indicator variables for cases from each 
of the four factor levels: 


Coding 
Factor Level Xi X2 X3 
1 1 0 0 
2 0 1 0 
3 0 0 1 
4 1 1 1 


A computer run of the multiple regression of Y on X,, X», and X; yielded the fitted 
regression function and analysis of variance table presented in Tables 16.4b and 16.Ас. Test 
statistic (16.78) therefore is: 


. MSR 196.07 18.6 
| MSE 10.55 ` 
This is the same test statistic obtained earlier based on the analysis of variance calculations. 
Indeed, the analysis of variance table in Table 16.4c obtained with the regression approach 
is the same as the one in Figure 16.5 obtained with the analysis of variance approach 
except that the treatment sum of squares and mean square are called the regression sum of 
squares and mean square in Table 16.4c. From this point on, the test procedure based on 
the regression approach parallels the analysis of variance test procedure explained earlier. 
Note that in the fitted regression function in Table 16.46, the intercept term д. = 18.675 
is the unweighted average of the estimated factor level means Y;., not the overall mean 
Y... because и. was defined as the unweighted average of the factor level means иу. The 
regression coefficient bj, = Ф| = Y;. — д. = 14.6 — 18.675 = —4.075 is simply the 
difference between the estimated mean in the first cell and the unweighted overall mean. bz 
and b; represent similar differences between the estimated factor level mean and the overall 
unweighted mean. D 


L3 


Comment 


The regression approach is not utilized generally for ordinary analysis of variance problems. The 
reason is that the X matrix for analysis of variance problems usually is of a very simple structure, БЫ 
we have seen earlier. This simple structure permits computational simplifications that аге explicitly 
recognized in the statistical procedures for analysis of variance. We take up the regression approach to 
analysis of variance here, and in later chapters, for two principal reasons. First, we see that analysis of 
variance models are encompassed by the general linear statistical model (6.19). Second. the regression 
approach is very useful for analyzing some multifactor studies when the structure of the X matrix 18 
not simple. ы 


Chapter 16  Single-Factor Studies 709 


actor Effects Model with Weighted Mean 


When the factor effects model (16.62) is used with a weighted mean, a modification of 
the coding scheme in (16.75) is required. The new coding scheme leads to changes in the 
definitions of the regression coefficients. We describe the new coding scheme and summarize 
the changes in the context of the proportional sample size weights, w; = n;/nr. 

When the constant u. is the weighted average of the factor level means using proportional 
sample size weights, we have, from (16.65): 


m=} wm=} a (16.80a) 
i=l i=l 


From (16.66), the restriction on the т; is: 


Solving for т,, we find: 


ns (16.80b) 


This leads to the weighted model: 


Ү = и. + Xin + € Xij 4 --- dM Xi + €i Full model (16.81) 


where: 
1  ifcase from factor level 1 
ipa АП if case from factor level r 
0 А otherwise 
1 if case from factor level r — 1 
Xi = <- =! if case from factor level г 


r 
0 otherwise 


Note that if all cell sample sizes are equal, the mean jz. is the unweighted mean, and the 
coding scheme above is the same as the unweighted coding scheme used in (16.75), since 
—n;/n, = —1 fori = 1,..., r-l. 

When the sample sizes are not all equal, as noted in (16.70), the least squares estimate 
of the weighted.mean pu. is the overall mean Y.., and the least squares estimate of the ith 
factor effect т; is Y;. — Y... 


In the Kenton Food Company example, weighted mean model (16.81) is: 


Yi; = p. + n Xii + Хә  1sXijs + Ey (16.82) 


z2’ 


710 PartFour Design and Analysis of Single-Factor Studies 


where: 

if case from factor level 1 
Xii = if case from factor level 4 
otherwise 

if case from factor level 2 
Xij = if case from factor level 4 
otherwise 

if case from factor level 3 


Күз = if case from factor level 4 


1 
5 
75 
0 
1 
5 
^5 
0 
1 
4 
^5 
O otherwise 
The fitted regression function is: 
Y = 18.63 — 4.03X, — 5.23X; + -87X3 
and the following relations hold: 
Ё. = bo = Y.. = 18.63 
ъ= bj = Yi. — Y.. = 14.6 — 18.63 = —4.03 
£& = b; = Үз. — Y.. = 13.4 — 18.63 = —5.23 
23 = Ьз = Үз. — Y.. = 19.5 — 18.63 = .87 


A n2. n3 
Ф, = —— t L7) Їз = 8.56. 
п д Пд 


А general linear test of the alternatives: 


Ho: ту = 0 = п = 0 
Ha: not all t = 0 
is conducted using the full model in (16.82) and forming the reduced model by setting 


Tj = T = 1з = O in full model (16.82). The test statistic (16.78) for the presence of a 
regression relation again yields: 


MSR 196.07 
= —— = —— = 18.6 
MSE 10.55 Ё 
As expected, the results are identical to those obtained earlier for the ANOVA F test. 


iol 


Cell Means Model 


When the analysis of variance test is to be conducted by means of the regression approach 
based on the cell means model (16.2): 


Yi; = ш t &j 


Chapter 16  Single-Factor Studies 711 
the В vector can be defined to contain all r treatment means 4i: 


Hi 
В = |: (16.83) 
Hr 


and r indicator variables X,, X2,..., X, are utilized, each defined as a 0, 1 variable as 
illustrated in Chapter 8: 


y= 1 if case from factor level 1 
1 ]0 otherwise 


К (16.84) 
1 if case from factor level r 
Х, = : 
O otherwise 
The regression model therefore is: 
Yi; = Mi Xii + uaXip +--+ + ue Xig + 8i Full model (16.85) 


with the u; playing the role of regression coefficients. 

The X matrix with this approach contains only О and 1 entries. For example, for r — 3 
factor levels with n, = n2 = из = 2 cases, the X matrix (observations in order Yj, Үә, 
Yo, etc.) and В vector would be as follows: 


100 

100 р 
010 1 
001 p 
0. 0 1 


Note that regression model (16.85) has no intercept term. When a computer regression 
package is to be employed for this case, it is important that a fit with no intercept term be 
specified. 

The ANOVA table obtained with regression model (16.85) is different from the one with 
the single-factor ANOVA model in (16.2) because the regression model (16.85) has no 
intercept term. 'Thus, the F test obtained with the regression model cannot be used to test 
the equality of factor level means. 'The test of whether the factor level means are equal, i.e., 
Hi = Шо = ++- = Hr asks only whether or not the regression coefficients in (16.83) are 
equal, not whether or not they equal zero. Hence, we need to fit the full model and then the 
reduced model to conduct this test. The reduced model when Ho: ш = --- = ur holds is: 


Ү = Me + &ij Reduced model (16.86) 


where ue is the common value of all џи; under Но. The X matrix here consists simply 
of a column of 1s. The X matrix and В vector for the reduced model in ouf example 


712 PartFour Design and Aualysis of Siugle-Factor Stidies 


would be: 
1 
1 
1 
1 
1 


After the full and reduced models are fitted and the error sums of squares are obtained 
for each fit, the usual general linear test statistic (2.70) is then calculated. 


Example For the Kenton Food Company example, the regression fit for the cell means mode in 
—— — ——— (16.85) is: 


Y = 14.6X, + 13.4Х» + 19.5X3 + 27.2X4 


It can be readily seen that the coefficient of X; is equal to the estimated factor level mean 


Yi. fori = 1...., 4. 
A general linear test of the alternatives: 
Ho: му = Шә = из = gua 
На: not all и; are equal 


is conducted using the full and reduced models in (16.85) and (16.86). Here we again find 

that SS E(R) = 746.42 and that SSE(F) = 158.2. From (2.70) we have: 

p 746.42 — 158.2 158.2 
Е 4— 1 | 19—4 

This demonstrates that the test for equality of means using the regression approach is, as 

expected, the same as that obtained earlier for the ANOVA F test. 


= 18.6 


16.9 Randomization Tests 


Randomization can provide the basis for making inferences without requiring assumptions 
about the distribution of the error terms =. Consider factor effects model (16.62) for a 
single-factor study: 


Yj = р. +G +; * 


Rather than assume that the £;; are independent normal random variables with mean zero 
and constant variance c7, we shall now consider each e; j to be a fixed effect associated 
with the experimental unit. In this framework, we view the ny experimental units to be à 
finite population, and associated with each unit is the unit-specific effect e,;. When ran- 
domization assigns this experimental unit to treatment /, the observed response will be 
Yi; = и. + т + еу. The response ¥j; is still a random variable, but under the randomiza- 
tion view the randomness arises because the treatment effect т; is the result of a random 
assignment of the experimental unit to treatment i. 

If there are no treatment effects, that is, if all т; = 0, then the response Y; = H- + &] 
depends only on the experimental unit. Since with randomization the experimental unit is 


Chapter 16  Single-Factor Studies 713 


equally likely to be assigned to any treatment, the observed response Y;;, if there are no 
treatment effects, could with equal likelihood have been observed for any of the treatments. 
Thus, when there are no treatment effects, randomization will lead to an assignment of the 
finite population of пт observations Y;; to the treatments such that all treatment combina- 
tions of observations are equally likely. This, in turn, leads to an exact sampling distribution 
of the test statistic under Ho: t; = 0, sometimes termed the randomization distribution of 
the test statistic. Percentiles of the randomization distribution can then be used to test for 
the presence of factor effects. This use of the randomization distribution provides the basis 
of a nonparametric test for treatment effects. 

To illustrate the concept of a randomization distribution, consider a single-factor experi- 
ment consisting of two treatments and two replications. In this experiment, the alternatives 
of interest are: — — 


Не: тү — т) = 0 


Ha: not both тү and тх equal zero 


Test statistic F* in (16.55) will be used to conduct the test. The sample results are: 


Treatment 1 Treatment 2 
Үз Yzj 
3 8 
7 10 


For these data, F* — 3.20. 

Since the treatments are assigned to experimental units at random, it would have been 
just as likely, if there are no treatment effects, to have observed 3 and 8 for treatment 1 and 
7 and 10 for treatment 2. In that event, the test statistic would have been F* — 1.06. In 
fact, any division of the four observations into two groups of size two is equally likely with 
randomization if there are no treatment effects. Because this experiment is small, we can 
easily list all 41/(2!2!) = 6 possible outcomes of the experiment, assuming no treatment 
effects are present: 


Randomization Treatment 1 Treatment 2 F* Probability 
1 3,7 8, 10 3.20 1/6 
2 3,8 7, 10 1.06 1/6 
3 3, 10 8, 7 .08 1/6 
4 8,7 3, 10 .08 1/6 
5 7, 10 3,8 1.06 1/6 
6 8, 10 3,7 3.20 1/6 


The last two columns give the randomization distribution of test statistic F* under Но. 
Randomization assures us that, when Но is true, each possible value of the test statistic has 
probability 1/6. From the randomization distribution, we see that the P-value for the test 


714 Part Four 


Example 


Design and Analysis of Single-Factor Studies 


is the probability: 


2 
P[F' > 320) = = = .33 


This P-value is somewhat different than the usual (normal theory) P-value: 
P(F(1,2) > 3.20} = 22 


In this instance, because the sample sizes are very small. the F distribution does not pro. 
vide a particularly good approximation to the exact sampling distribution of F* under Hy. 
However, both empirical and theoretical studies have shown that the F distribution is а 
good approximation to the exact randomization distribution when the sample sizes are not 
small. Thus, randomization alone can justify the F test as a good approximate test, without 
requiring any assumption of independent, normal error terms. We shall next demonstrate 
the use of the randomization test in a more realistic setting. 


Comments 


1. Because of the discreteness of the randomization distribution, it is conservative to define the 
P-value as the probability of equaling or exceeding the observed value of the test statistic when Hy 
holds. For continuous sampling distributions, it does not matter whether the P-value is defined as the 
probability of exceeding the observed value of the test statistic or as the probability of equaling or 
exceeding it. For instance, P{F (1, 2) > 3.20) = P{ F(1. 2) > 3.20}. When more than one treatment 
combination yields the value of the test statistic F*, some authors suggest that the P-value be calculated 
as P{F > F*} + P{F = F*}/2. This leads to a less conservative P-value. 

2. The randomization test is sometimes referred to as a permutation test, although permutation 
tests are also applied to nonrandomized studies. Because of the conservativeness of permutation (or 
randomization) tests for small samples, their virtues continue to be debated in the literature. See 
Reference 16.1. a 


A manufacturer of children’s plastic toys considered the introduction of statistical process 
control (SPC) and engineering process control (EPC) in order to reduce the volume of scrap 
and rework at each of its nine manufacturing plants. To assess the effects of these quality 
practices, a single-factor experiment was conducted for a six-month period. The treatments 


were: 
а 


Treatment 
i Quality Practice 
1 None (control group) 
2 SPC 
3 Both SPC and EPC 


The three treatments were each randomly assigned to three of the nine available plants. The 
response of interest was the reduction in the defect rate at the end of the six-month trial 
period. The results are given in the first row (randomization 1) in Table 16.5. Management 
wishes to test whether or not the mean reduction in the defect rate is the same for the three 


Chapter 16  Single-Factor Studies 715 


TABLE 16.5 Randomization Samples and Test Statistics—Quality Control Example. 


pe Rau in 
н Ra imizatio 


= 


FIGURE 16.8 
Randomization 
Distribution of 
F*and Cor- 
responding F 
Distribution— 
Quality 


pi 


"Treatment Treatment Treatment Probability 
1 FR 2 3 F* 
13, .5, -2.1 42, 37, 8 32,28, 63 4.39 1/1,680 
1.1, .5, —21 4.2, 3.7, 3.2 8, 2.8, 63 3.74 1/1,680 
1.1, .5, —21 4.2, 3:7, 28 3.2, 8, 63 3.67 171,680 
3.2, 2.8, 6.3 4.2; 3.7, .8 1.1, .5, —21 4.39 1/1,680 
treatments: 


Ho: Tı = т = 13 = 0 
Ha: not all т; equal zero 


The risk of a Type I error is to be controlled at œ = .10. We shall now conduct this test by 
obtaining the exact randomization distribution. 

In this experimental study, there are 9!/(3!3!31!) = 1,680 possible combinations of as- 
signing the nine experimental units to the three treatments. A computer program was utilized 
to enumerate these 1,680 combinations and to calculate the F* statistic for each. A partial 
listing of results is presented in Table 16.5. 

Of the 1,680 possible values of the test statistic F*, 120 were equal to or greater than 
the observed value 4.39. Thus, from the randomization distribution we find: 


120 
1,680 
Since .071 «o = .10, we conclude that the mean reduction in the defect rate is not the 
same for the three treatments. 


Even though the sample sizes are not very large here, the exact randomization distribution 
is well approximated by the F distribution. Figure 16.8 shows both the randomization 


P-value = P(F* > 4.39} = = .071 


1.0 


0.0 


716 PartFour Design and Analysts of Single-Faciar Studies 


distribution in the form of a histogram and the density function for the Corresponding F 
distribution, F (2, 6). Note how well the F distribution approximates the randomization 
distribution. The P-value according to the F distribution is P(F(2, 6) > 4.39) ~ 067 
This is very close to the randomization P-value of .071. à 


16.10 Planning of Sample Sizes with Power Approach 


For analysis of variance studies, as for other statistical studies, it is important to plan the 
sample sizes so that needed protection against both Type I and Type П errors can be obtained 
or so that the estimates of interest have sufficient precision to be useful. This planning is 
necessary for both observational and experimental studies to ensure that the sample sizes 
are large enough to detect important differences with high probability. At the same time, 
the sample sizes should not be so large that the cost of the study becomes excessive and that 
unimportant differences become statistically significant with high probability. Planning of 
sample sizes is therefore an integral part of the design of a study. 

We shall generally assume in our discussion of planning sample sizes that all treatments 
are to have equal sample sizes, reflecting that they are about equally important. Indeed, 
when major interest lies in pairwise comparisons of all treatment means, it can be shown 
that equal sample sizes maximize the precision of the comparisons. Another reason for 
equal sample sizes is that certain departures from the assumed ANOVA model are legs 
troublesome if all factor levels have the same sample size, as noted earlier. 

There will be times, however, when unequal sample sizes are appropriate. For instance, 
when four experimental treatments are each to be compared to a control, it may be reasonable 
to make the sample size for the control larger. We shall comment later on the planning of 
sample sizes for such a case. 

Planning of sample sizes can be approached in terms of (1) controlling the risks of 
making Type I and Type H errors, (2) controlling the widths of desired confidence intervals, 
or (3) a combination of these two. The procedures for planning sample sizes that we shall 
discuss here are applicable to both observational studies and to experimental studies based 
on a completely randomized single-factor design. In later chapters, we shall consider the 
planning of sample sizes for other study designs. In this section, we consider planning of 
sample sizes with the power approach, which permits controlling the risks of making Туре 
and Type H errors. In Section 16.11 we discuss planning of sample sizes when the best 
treatment is to be identified. Later, in Section 17.8, we take up planning of sample sizes 
to control the precision of estimates of important effects. We shall consider planning of 
sample sizes for multifactor studies in Section 24.7. 

Before we can discuss planning of sample sizes with the power approach, we need to 
consider the power of the F test. 


Power of F Test 


By the power of the F test for a single-factor study, we refer to the probability that the 
decision rule will lead to conclusion H,, that the treatment means differ, when in fact 
H, holds. Specifically, the power is given by the following expression for the cell means 
model (16.2): 


Power = P{F* > F(1—o;r— l,npg—r)l Qi (16.87) 


Examples 


Chapter 16  Single-Factor Studies 717 


where ф is'the noncentrality parameter, that is, a measure of how unequal the treatment 


means i; аге: 
p= Lamut — Y (16.872) 
с r 


и. = этш (16.87Ь) 


пт 
When all factor level samples are of equal size n, the parameter ф becomes: 


1 jn 
ф= = / > Уш — u.) when n; =n (16.88) 


pe >ш (16.88а) 


and: 


where: 


Power probabilities are determined by utilizing the noncentral F distribution since this 
is the sampling distribution of F* when H, holds. The resulting calculations are quite 
complex. We present a series of tables in Appendix Table B.11 that can be used readily to 
look up power probabilities directly. The proper table to use depends on the number of factor 
levels and the level of significance employed in the decision rule. Specifically, Table B.11 
is used as follows: 


1. Each page refers to a different v,, the number of degrees of freedom for the numerator 
of F*. For ANOVA model (16.2), v, — r — 1, or the number of factor levels minus one. 
Table B.11 contains power tables for v; = 2, 3, 4, 5, and 6, as shown at the top of each page. 

2. Two levels of significance, denoted by о, are presented in Table B.11, namely, a = .05 
and a = .01. The upper table on each page refers to a = .05 and the lower table to о = .01. 

3. Within each table, the rows refer to different values of v2, the degrees of freedom 
for the denominator of F*. The columns refer to different values of $, the noncentrality 
parameter defined in (16.872). For ANOVA model (16.2), vo = пт — r. 


1. Consider the case where v; = 2, v; = 10, ф = 3, and о = .05. We then find from 
Table B.11 (p. 1337) that the power is 1 — B = .98. 

2. Suppose that for the Kenton Food Company example, the analyst wishes to determine 
the power of the decision rule in the example on page 699 when there are substantial 
differences between the factor level means. Specifically, the analyst wishes to consider the 
case when ш = 12.5, u2 = 13, из = 18, and иа = 21. The weighted mean in (16.87b) 
therefore is: 


es 5(12.5) 15, 418) +501) _ ico 


Thus, the specified value of ф is: 


" 1 ЕЕ + 5(—3.03)? + 4(1.97)? + 204 1/2 
= 4 


с 


= 10786) “3 
Т 


718 Part Four 


Design and Analysis of Single- Factor Snidies 


Note that we still need to know c, the standard deviation of the error terms Ejj in the 
= - ES f 
model. Suppose that from past experience it is known that с = 3.5 cases approxima 


te 
Then we have: у. 


l 
= — (7.86) = 2.25 
$ = 7507-86) 
Further, we have for this example: 
y=r-l1=3 by = пр = Р = 15 = .05 


Table B.I I on page 1338 indicates that the power is | — f = .91. In other words, there are 
91 chances in 100 that the decision rule, based on the sample sizes employed, will lead to 
the detection of differences in the mean sales volumes for the four package designs when 
the differences are the ones specified earlier. 


Comments 


І. Any given value of $ encompasses many different combinations of factor level means ju. Thus, 
in the Kenton Food Company example, the means 44, = 12.5, w2 = 13, из = 18, ш = 21 and the 
means ш = 21, р = 12.5, из = 18. xy = 13 lead to the same value of ф = 2.25 and hence to the 
same power. 


2. The larger ó—that is, the larger the differences between the factor level means—the greater 
the power and hence the smaller the probability of making a Type 11 error for a given risk o of making 
a Type I error. Also, the smaller the specified о risk. the smaller is the power for any given ф, and 
hence the larger the risk of a Type H error. 

3. Since many single-factor studies are undertaken because of the expectation that the factor level 
means differ and it is desired to investigate these differences, the œ risk used in constructing the 
decision rule for determining whether or not the factor level means are equal is often set relatively 
high (e.g., .05 or .10 instead of .01) so as to increase the power of the test. 

4. The power table for v, = 1 is not reproduced in Table B.11 since this case corresponds to the 
comparison of two population means. As noted previously, the F test is the equivalent of the two-sided 
t test for this case, and the power tables for the two-sided t test presented in Table B.5 can then be 
used, with noncentrality parameter: 


s= Hel (16.89) 


and degrees of freedom пу + п» — 2. ЫШ 


Use of Table B.12 for Single-Factor Studies 


The power approach in planning sample sizes can be implemented by use of the power 
tables for F tests presented in Table В.11. A trial-and-error process is required, however 
with these tables. Instead, we shall use other tables that furnish the appropriate sample 
sizes directly. Table B.12 presents sample size determinations that are applicable when all 
treatments are to have equal sample sizes and all effects are fixed. : 
The planning of sample sizes for single-factor studies with fixed factor levels using 
Table B.12 is done in terms of the noncentrality parameter (16.88) for equal sample sizes. 
However, instead of requiring a direct specification of the levels of u; for which it is impor 
tant to control the risk of making a Type II error, Table B.12 only requires a specification 


Example 


озы NE 
Example 
-Xample — 


Chapter 16  Single-Factor Studies 719 


of the minimum range of factor level means for which it is important to detect differences 
between the ш; with high probability. This minimum range is denoted by A: 


A = тах(ш;) — min(m) (16.90) 
The following three specifications need to be made in using Table B.12: 


1. The level œ at which the risk of making a Type I error is to be controlled. 

2. The magnitude of the minimum range A of the и; which is important to detect with 
high probability. The magnitude of o , the standard deviation of the probability distributions 
of Y, must also be specified since entry into Table B.12 is in terms of the ratio: 


2 (16.91) 


С 


3. The level В at which the risk of making a Type П error is to be controlled for ће 
specification given in 2. Entry into Table B.12 is in terms of the power 1 — f. 


When using Table B.12, four a levels are available at which the risk of making a Type I 
error can be controlled (о = .2, .1, .05, .01). The Type II error risk can be controlled at one 
of four B levels (В = .3, .2, .1, .05) through the specification of the power 1 — B. Table B.12 
provides necessary sample sizes for studies consisting of r — 2,..., 10 factor levels or 
treatments. 


A company owning a large fleet of trucks wishes to determine whether or not four different 
brands of snow tires have the same mean tread life (in thousands of miles). It is important 
to conclude that the four brands of snow tires have different mean tread lives when the 
difference between the means of the best and worst brands is 3 (thousand miles) or more. 
Thus, the minimum range specification is A — 3. It is known from past experience that the 
standard deviation of the tread lives of these tires is c = 2 (thousand miles), approximately. 
Management would like to control the risks of making incorrect decisions at the following 
levels: 


В = 10 ог Power = 1 — В = .90 


Entering Table В.12 for Д/с = 3/2 = 1.5, а = .05, 1 — В = .90, and r = 4, we find 
n = 14. Hence, 14 snow tires of each brand need to be tested in order to control the risks 
of making incorrect decisions at the desired levels. 


Specification of A/o Directly. Table B.12 can also be used when the minimum range 
is specified directly in units of the standard deviation o. Let the specification of A in this 
case be ko so that we have by (16.91): 

A^ k 

iic cp 

c c 


Hence, Table B.12 is entered directly for the specified value k with this approach. 


Suppose it is specified in the snow tires example that it is important to detect differences 
between the mean tread lives if the range of the mean tread lives is k — 2 standard deviations 


720 PartFour Design and Analysis of Single-Factor Studies 


or more. Suppose also that the other specifications are: 


а = .10 
В = .05 or Power = 1 — В = .95 


From Table B.12, we find for k = 2 andr = 4 that n = 9 tires will need to be tested for 
each brand in order that the specified risk protection will be achieved. 


Comment 


While specifying A /o directly does not require an advance planning value of the standard deviationg 
this is not of as much advantage as it might seem because a meaningful specification of A in units of 
с will frequently require knowledge of the approximate magnitude of the standard deviation, [| 


Some Further Observations on Use of Table B.12 


1. The exact specification of A/c has great effect on the sample sizes n when Дуо is 
small, but it has much less effect when Д/о is large. For instance, when r = 3, о = .05, 
and 8 = .10, we have from Table B.12: 


A/o n 
1.0 27 
1.5 13 
2.0 8 
2.5 6 


Thus, unless A/o is quite small, one need not be too concerned about some imprecision in 
specifying A/o. 

2. Reducing either the specified o or £ risks or both increases the required sample sizes. 
For instance, when r = 4, œ = .10, and A/o = 1.25, we have: 


в 1-68 п 


.20 80 13 
10 .90 16 » 
.05 .95 20 


3. A moderate error in the advance planning value of o can cause a substantial miscal- 
culation of required sample sizes. For instance, when r = 5, a = .05, В = .10, and A =3, 
we have: 


c A/o n 
1 3.0 5 
2 1.5 15 
3 1.0 32 


Chapter 16  Single-Factor Studies 721 


In view of the usual approximate nature of the advance planning value of c, it is generally 
desirable to investigate the needed sample sizes for a range of likely values of c before 
deciding on the sample sizes to be employed. 

4. Table B.12 is based on the noncentrality parameter ¢ in (16.88) even though no 
specification is made of the individual factor level means u; for which it is important to 
conclude that the factor level means differ. To see how Table B.12 utilizes the noncentrality 
parameter ф, consider again the snow tires example where r = 4 brands are to be tested 
and a minimum range of A = 3 (thousand miles) of the four mean tread lives 44; is to be 
detected with high probability. The following are some possible sets of values of the p, 
each of which has range A = 3: 


Case n H2 pa Ша Уш — и.)? 
1 24 27 25 26 5.00 
2 25 25 26 23 4.75 
3 25 25 25 28 6.75 
4 25 25 26.5 23.5 4.50 


The term Y (и; — p.)? of the noncentrality parameter in (16.88) differs for each of these 
four possibilities and hence the power differs, even though the range is the same in all cases. 
Note that the term У (u; — y.)? is the smallest for case 4, where two factor level means 
are at u. and the other two are equally spaced around џ.. It can be shown that for a given 
range A, the term Y (u; — u.)? is minimized when all but two factor level means are at и. 
and the two remaining factor level means are equally spaced around џ.. Thus, we have: 


2 


r 2 2 
min Y "(u; — u}? = (2) + (-3) +0+-:: +0 = > (16.92) 
il 


Since the power of the test varies directly with Y (jy — и..)?, use of (16.92) in calculating 
Table B.12 ensures that the power is at least 1 — В for any combination of u; values with 
range A. 


16.11 Planning of Sample Sizes to Find "Best" Treatment 


"TEN 
Example 
——————— 


There are occasions when the chief purpose of the study is to ascertain the treatment with 
the highest or lowest mean. In the snow tires example, for instance, it may be desired to 
determine which óf the four brands has the longest mean tread life. 

Table B.13, developed by Bechhofer, enables us to determine the necessary sample sizes 
so that with probability 1 — o the highest (lowest) estimated treatment mean is from the 
treatment with the highest (lowest) population mean. We need to specify the probability 
1— g, the standard deviation с, and the smallest difference А between the highest (lowest) 
and second highest (second lowest) treatment means that it is important to recognize. 
Table B.13 assumes that equal sample sizes are to be used for all r treatments. 


Suppose that in the snow tires example, the chief objective is to identify the brand with the 
longest mean tread life. There arer = 4 brands. We anticipate, as before, that o = 2 (thousand 


722 PartFour 


Design and Analysis of Single-Factor Snidies 


miles). Further, we are informed that a difference A = I (thousand miles) between the highest 
and second highest brand means is important to recognize, and that the probability ig to be 
1— а = .90 or greater that we identify correctly the brand with the highest mean (геад life 
when A > 1. 

The entry in Table B.13 is Ayn /o. For г = 4 and probability 1 — œ = .90, we find from 
Table B.13 that 2./n/o = 2.4516. Hence, since the А specification is A = І, we obtain: 


Ox 
2 


мп = 4.9032 ог п = 25 


= 2.4516 


Thus, when the mean tread life for the best brand exceeds that of the second best by at least 
I (thousand miles) and when с = 2 (thousand miles), sample sizes of 25 tires for each 
brand provide an assurance of at least .90 that the brand with the highest estimated mean 
Y;. is the brand with the highest population mean. 


Comment 


If the planning value for the standard deviation is not accurate, the probability of identifying the 
population with the highest (lowest) mean correctly is, of course, affected. This is no different from 
the other approaches, where a misjudgment of the standard deviation affects the risks of making a 
Type II error. a 


Cited 
Reference 


16.1. Berger. V. W. “Pros and Cons of Permutation Tests in Clinical Trials,” Statistics in Medicine 19 
(2000), pp. 1319-1328. 


Problems 


16.1. Refer to Figure 16.1a. Could you determine the mean sales level when the price level is $68 if 
you knew the true regression function? Could you make this determination from Figure 16.1b 
if you only knew the values of the parameters ит, иэ, and из of ANOVA model (16.2)? What 
distinction between regression models and ANOVA models is demonstrated by your answers? 

16.2. A market researcher, having collected data on breakfast cereal expenditures by families with 
1.2. 3, 4. and 5 children living at home, plans to use an ordinary regression model to estimate 
the mean expenditures at each of these five family size levels. However, the researcher is 
undecided between fitting a linear or a quadratic regression model, and the data do not give 
clear evidence in favor of one model or the other. A colleague suggests: "For your purposes 
you might simply use an ANOVA model.” Is this a useful suggestion? Explain. 

16.3. Ina study of intentions to get flu-vaccine shots in an area threatened by an epidemic, 90 persons 
were classified into three groups of 30 according to the degree of risk of getting flu. Each 
group was together when the persons were asked about the likelihood of getting the shots, ОП 
a probability scale ranging from 0 to 1.0. Unavoidably, most persons overheard the answers 
of nearby respondents. An analyst wishes to test whether the mean intent scores are the same 
for the three risk groups. Consider each assumption for ANOVA model (16.2) and explain 
whether this assumption is likely to hold in the present situation. 

16.4. A company, studying the relation between job satisfaction and length of service of employees, 
classified employees into three length-of-service groups (less than 5 years, 5-10 years, more 
than 10 years). Suppose ит = 65, u2 == 80, из = 95, and ø = 3, and that ANOVA model 
(16.2) is applicable. 


16.5. 


16.6. 


ж16.7. 


16.8. 


Chapter 16  Single-Factor Studies 723 


a. Draw a representation of this model in the format of Figure 16.2. 

b. Find E{MSTR} and E{MSE} if 25 employees from each group are selected at random for 
intensive interviewing about job satisfaction. Is E{MSTR} substantially larger than E{MSE} 
here? What is the implication of this? 

In a study of length of hospital stay (in number of days) of persons in four income groups, the 

parameters are as follows: ш = 5.1, рә = 6.3, из = 7.9, u4 = 9.5, o = 2.8. Assume that 

ANOVA model (16.2) is appropriate. 

a. Draw а representation of this model in the format of Figure 16.2. 

b. Suppose 100 persons from each income group are randomly selected for the study. Find 
E{MSTR} and E{MSE}. Is E{MSTR} substantially larger than E(MSE] here? What is the 
implication of this? 

C. If p2 = 5.6 and из = 9.0, everything else remaining the same, what would E{MSTR} be? 
Why is E{MSTR} substantially larger here than in part (b) even though the range of the 
factor level means is the same? 


A student asks: *Why is the F test for equality of factor level means not à two-tail test since 
any differences among the factor level means can occur in either direction?" Explain, utilizing 
the expressions for the expected mean squares in (16.37). 

Productivity improvement. An economist compiled data on productivity improvements last 
year for a sample of firms producing electronic computing equipment. The firms were clas- 
sified according to the level of their average expenditures for research and development in 
the past three years (low, moderate, high). The results of the study follow (productivity im- 
provement is measured on a scale from 0 to 100). Assume that ANOVA model (16.2) is 
appropriate. 


т =н жи А у = ште 
i 1 2 3 4 5 6 7 8 9 10 11 12 

1 Low 76 82 68 58 69 66 63 7. 60 
2 Moderate 6.7 81 94 86 78 77 89 79 83 87 71 84 


3 High 85 97 10.1 78 96 95 


a. Prepare aligned dot plots of the data. Do the factor level means appear to differ? Does the 
variability ofthe observations within each factor level appear to be approximately the same 
for all factor levels? 

. Obtain the fitted values. 

. Obtain the residuals. Do they sum to zero in accord with (16.21)? 

Obtam the analysis of variance table. 

. Test whether or not the mean productivity improvement differs according to the level of 
research and development expenditures. Control the o risk at .05. State the alternatives, 
decision rule, and conclusion. 

f. What is the P-value of the test in part (e)? How does it support the conclusion reached in 

part (e)? 

g. What appears to be the nature of the relationship between research and development 

expenditures and productivity improvement? 


оар = 


et 
Questionnaire color. In an experiment to investigate the effect of color of paper (blue, 
green, orange) on response rates for questionnaires distributed by the “windshield method" 


724; Part Four Design and Analysis of Single-Factor Studies 


16.9. 


in supermarket parking lots, 15 representative supermarket parking lots were Chosen 
metropolitan area and each color was assigned at random to five of the lots. The response 
(in percent) follow. Assume that ANOVA model (16.2) is appropriate. 


ina 
Tátes 


1 
i 1 2 3 4 5 
1 Blue 28 26 31 27 35 
2 Green 34 29 25 31 29 
3 Orange 31 25 27 29 28 


. Prepare aligned dot plots of the data. Do the factor level means appear to differ? Does 


the variability of the observations within each factor level appear to be approximately the 
same for all factor levels? 


. Obtain the fitted values. 
. Obtain the residuals. 


d. Obtain the analysis of variance table. 


Conduct a test to determine whether or not the mean response rates for the three colors 
differ. Use level of significance o = . 10. State the alternatives, decision rule, and conclusion, 
What is the P-value of the test? 


. When informed of the findings, an executive said: "See? I was right all along. We might as 


well print the questionnaires on plain white paper, which is cheaper.” Does this conclusion 
follow from the findings of the study? Discuss. 


Rehabilitation therapy. A rehabilitation center researcher was interested in examining the 
relationship between physical fitness prior to surgery of persons undergoing corrective knee 
surgery and time required in physical therapy until successful rehabilitation. Patient records 
in the rehabilitation center were examined, and 24 male subjects ranging in age from 18 
to 30 years who had undergone similar corrective knee surgery during the past year were 
selected for the study. The number of days required for successful completion of physical 
therapy and the prior physical fitness status (below average, average, above average) for each 
patient follow. 


j 
i 1 2 3 4 5 6 7 8 9 10 
1 Below Average 29 42 38 40 43 40 30 42* 
2 Average 30 35 39 28 31 31 29 35 29 33 


3  AboveAverage 26 32 21 20 23 22 


Assume that ANOVA model (16.2) is appropriate. 


а. 


Prepare aligned dot plots of the data. Do the factor level means appear to differ? Does the 
variability of the observations within each factor level appear to be approximately the same 
for all factor levels? 


Obtain the fitted values. 
Obtain the residuals. Do they sum to zero in accord with (16.21)? 


d. Obtain the analysis of variance table. 


*16.10. 


*16.11. 


Chapter 16  Single-Factor Studies 725 

e. Test whether or not the mean number of days required for successful rehabilitation is the 
same for the three fitness groups. Control the o risk at .01. State the alternatives, decision 
rule, and conclusion. 

f. Obtain the P-value for the test in part (e). Explain how the same conclusion reached in 
part (e) can be obtained by knowing the P-value. 

g. What appears tobe the nature of the relationship between physical fitness status and duration 
of required physical therapy? 


Cash offers. A consumer organization studied the effect of age of automobile owner on size 
of cash offer for a used car by utilizing 12 persons in each of three age groups (young, middle, 
elderly) who acted as the owner of a used car. A medium price, six-year-old car was selected 
for the experiment, and the “owners” solicited cash offers for this car from 36 dealers selected 
at random from the dealers in the region. Randomization was used in assigning the dealers to 
the “owners.” The offers (in hundred dollars) follow. Assume that ANOVA model (16.2) is 
applicable. 


j 
i 1 2 3 4 5 6 7 8 9 10 1 12 
Yong 23 25 21 22 21 22 20 23 19 22 19 21 


Midde 28 27 27 29 26 29 27 30 28 27 26 29 
Elderly 23 20 25 21 22 23 л 20 19 20 22 21 


UNA 


à. Prepare aligned dot plots of the data. Do the factor level means appear to differ? Does 
the variability of the observations within each factor level appear to be approximately the 
same for all factor levels? 


. Obtain the fitted values. 
. Obtain the residuals. 
. Obtain the analysis of variance table. 


. Conduct the F test for equality of factor level means; use o — .01. State the alternatives, 
decision rule, and conclusion. What is the P-value of the test? 


o o gc 


f. What appears to be the nature of the relationship between age of owner and mean cash 
offer? 


Filling machines. A company uses six filling machines of the same make and model to place 
detergent into cartons that show a label weight of 32 ounces. The production manager has 
complained that the six machines do not place the same amount of fill into the cartons. A 
consultant requested that 20 filled cartons be selected randomly from each of the six machines 
and the content of each carton carefully weighed. The observations (stated for convenience as 
deviations from 32.00 ounces) follow. Assume that ANOVA model (16.2) is applicable. 


J 
i 1 2 3 wierd 18 19 20 
1 —.14 .20 .07 Je 07 —.01 —.19 
2 .46 11 12 wit .02 11 12 
3 21 78 .32 ics .50 .20 .61 
4 49 .58 .52 es 42 45 ‚20 
5 —.19 27 .06 TE 14 .35 —.18 
6 .05 —.05 .28 n .35 —.09 *05 


726 PartFour Design and Analysis of Single- Factor Studies 


16.12. 


16.13. 


16.14. 


a. Prepare aligned box plots of thc data. Do the factor level means appear to differ? Does 
variability of the observations within cach factor level appear to be approximately the same 
for ali actor levels? 


. Obtain the titted valucs. 
. Obrain the residuals. Do they sum to zero in accord with (16.21)? 


. Obtain the analysis of variance table. 


on Oo gd 


. Test whether or not the mean Ш differs among the six machines: control the æ risk at 05. 
State the alternatives. decision rule. and conclusion. Does your conclusion Support the 
production manager's complaint? 

Е. What is the P-valuc of the test in part (e)? Is this value consistent with your conclusion in 

part (е)? Explain, 

Based on the box plots obtained in part (а), does the variation between the mean fills for 

the six machines appear to be large relative to the variability in fills between cartons for 

any given machine? Explain. 


т >] 


Premium distribution. A soft-drink manufacturer uses five agents (1, 2, 3. 4, 5) to handle 
premium distributions for its various products. The marketing director desired to study the 
timeliness with which the premiums are distributed. Twenty transactions for each agent were 
selected at random. and the time lapse (in days) for handling euch transaction was determined, 
The results follow. Assume that ANOVA model (16.2) is appropriate. 


i 
i 1 2 3 eee 18 19 20 
1 24 24 29 27 26 25 
-2 18 20 20 ЗЯ 26 22 21 
3 10 11 8 sae 9 11 12 
4 15 13 18 aed 17 14 16 
5 33 22 28 es 26 30 29 


a. Prepare aligned box plots of the data. Do the factor level means appear to differ? Does 
the variability of the observations within each factor level appear to be approximately the 
same for all factor levels? 


Obtain the fitted values. 
Obtain the residuals. Do they sum to zero in accord with (16.21)? 


Obtain the analysis of variance table. 


оро 5 


Test whether or not ће mean time lapse dilfers for the five agents: use о = .10. State the 

alternatives. decision rule, and conclusion. А 

f. What is the P-value of the test in part (e)? Explain how the same conclusion as in part (е) 
can be reached by knowing the P-value. 

€. Based on the box plots obtained in part (a), does there appear to be much variation in the 
mean time lapse lor the five agents? Is this variation necessarily the result of differences 
in the efficiency of operations of the five agents? Discuss. 

Relcrto Questionnaire color Problem 16.8. Explain how you would make the random assign- 

ments of supermarket parking lots to colors in this singlc-factor study. Make all appropriate 

randomizations. 

Refer to Cash offers Problem 16.10. Explain how you would make the random assignments 

of dealers to "owners" in this single-factor study. Make all appropriate randomizations. 


16.15. 


16.16. 


16.17. 


*16.18. 


16.19. 


16.20. 


*16.21. 


Chapter 16  Single-Factor Studies 727 


Refer to.Problem 16.4. What are the values of тү, 72, and тз if the ANOVA model is expressed 
in the factor effects formulation (16.62), and џ. is defined by (16.63)? 


Refer to Problem 16.5. What are the values of т; if the ANOVA model is expressed in the 
factor effects formulation (16.62), and џ. is defined by (16.63)? 


Refer to Premium distribution Problem 16.12. Suppose that 25 percent of all premium 
distributions are handled by agent 1, 20 percent by agent 2, 20 percent by agent 3, 20 percent 
by agent 4, and 15 percent by agent 5. 


а. Obtain a point estimate of p. when the ANOVA model is expressed in the factor effects 
formulation (16.62) and џ. is defined by (16.65), with the weights being the proportions 
of premium distribution handled by each agent. 


b. State the alternatives for the test of equality of factor level means in terms of factor effects 
model (16.62) for the present case. Would this statement be affected if и. were defined 
according to (16.63)? Explain. 

Refer to Productivity improvement Problem 16.7. Regression model (16.75) is to be 

employed for testing the equality of the factor level means. 

a. Setup the Y, X, and В matrices. 


b. Obtain XB. Develop equivalent expressions of the elements of this vector in terms of the 
cell means pi. 


с. Obtain the fitted regression function. What is estimated by the intercept term? 

d. Obtain the regression analysis of variance table. 

e. Conduct the test for equality of factor level means; use a = .05. State the alternatives, 
decision rule, and conclusion. 

Refer to Questionnaire color Problem 16.8. Regression model (16.75) is to be employed for 

testing the equality of the factor level means. 

a. Setup the Y, X, and В matrices. 

b. Obtain XB. Develop equivalent expressions of the elements of this vector in terms of the 
cell means p. 

c. Obtain the fitted regression function. What is estimated by the intercept term? 

d. Obtain the regression analysis of variance table. 

e. Conduct the test for equality of factor level means; use a = .10. State the alternatives, 
decision rule, and conclusion. 

Refer to Rehabilitation therapy Problem 16.9. Regression model (16.81) is to be employed 

for testing the equality of the factor level means. 

а. Setup the Y, X, and В matrices. 


b. Obtain XB. Develop equivalent expressions of the elements of this vector in terms of the 
cell means p. 

c. Obtain the fitted regression function. What is estimated by the intercept term? 

d. Obtain the regression analysis of variance table. 

e. Conduct the test for equality of factor level means; use œ = .01. State the alternatives, 
decision rule, and conclusion. 

Refer to Cash offers Problem 16.10. 

a. Fit regression model (16.75) to the data. What is estimated by the intercept term? 

b. Obtain the regression analysis of variance table and test whether or not the factor level 
means are equal; use a = .01. State the alternatives, decision rule, and conclusion. 


et 


728 PartFour Design and Analysis of Single-Factor Studies 


a 


16.22. 


16.23. 


16.24. 


*16.25. 


16.26. 


*16.27. 


16.28. 


* 16.29. 


16.30. 


16.31. 


16.32. 


Refer to Rehabilitation therapy Problem 16.9. 


a. Fit the full regression model (16.85) to the data. Why would a fitted regression model 
containing an intercept term not be proper here? 

b. Fit the reduced model (16.86) to the data. 

c. Use test statistic (2.70) for testing the equality of the factor level means; employ level of 
significance a = .01. 

Refer to Example 1 on page 717. Find the power of the test if а = .01, everything else 

remaining unchanged. How does this power compare with that in Example 1? 

Refer to Example 2 on page 717. The analyst is also interested in the power of the lest when 

Hi = gua = 13 and из = шу = 18. Assume that o = 3.5. 


a. Obtain the power of the test if o = .05. 
b. What would be the power of the test if à = .01? 


Refer to Productivity improvement Problem 16.7. Obtain the power of the test in Prob- 
lem 16.7e if и, = 7.0, и> = 8.0, and из = 9.0. Assume that o = .9. 

Refer to Rehabilitation therapy Problem 16.9. Obtain the power of the test in Problem 16.9e 
if мї = 37, и› = 35. and из = 28. Assume that o = 4.5. 

Refer to Cash offers Problem 16.10. Obtain the power of the test in Problem 16.10e if the 
mean cash offers are и = 22, и» = 28, and из = 22. Assume that o. = 1.6. 

Why do you think that the approach to planning sample sizes to find the best treatment by 
means of Table B.13 does not consider the risk of an incorrect identification when the best 
two treatment means are the same or practically the same? 

Consider a single-factor study where г = 5. œ = .01, B = .05, and o = 10, and equal treatment 
sample sizes are desired by means of the approach in Table B.12. 


a. Whatare the required sample sizes if ^ — 10, 15, 20. 30? What generalization is suggested 
by your results? 

b. What are the required sample sizes for the same values of A as in part (a) if = .05, all 
other specifications remaining the same? How do these sample sizes compare with those 
in part (a)? 

Consider а single-factor study where ғ = 6. œ = .05. В = .10, and A = SO, and equal treatment 

sample sizes are desired by means of the approach in Table B.12. 


a. What are the required sample sizes if о = 50, 25, 20? What generalization is suggested 
by your results? 

b. What are the required sample sizes for the same values of ø as in рай (a) if r = 4, all 
other specifications remaining the same? How do these sample sizes compare with those 
in part (a)? 

Consider а single-factor study where к = 5, 1 — о = .95, and ø = 20, and equal sample sizes 

are desired by means of the approach in Table B.13. 


a. What are the required sample sizes if A = 20. 10, 5? What generalization is suggested by 
your results? 

b. What are the required sample sizes for the same values of A as in part (a) if o. = 30, all 
other specifications remaining the same? How do these sample sizes compare with those 
in part (a)? 

Refer to Questionnaire color Problem 16.8. Suppose that the sample sizes have not yet been 

determined but it has been decided to sample the same number of supermarket parking lots 

for each questionnaire color. A reasonable planning value for the error standard deviation IS 

о = 3.0. 


et 


16.33. 


*16.34. 


16.35. 


Chapter 16  Single-Factor Studies 729 


a. What would be the required sample sizes if: (1) differences in the response rates are to be 
detected with probability .90 or more when the range of the treatment means is 4.5, and 
(2) the о risk is to be controlled at .05? 

b. If the sample sizes determined in part (а) were employed, what would be the minimum 
power of the test for treatment mean differences (using a = .05) when the range of the 
treatment means is 6.0? 

с. Suppose the chief objective is to identify the color with the highest mean response rate. 
The probability should be at least .99 that the best color is recognized correctly when the 
difference between the response rates for the best and second best colors is 1.5 percent 
points or more. What are the required sample sizes? 


Refer to Rehabilitation therapy Problem 16.9. Suppose that the sample sizes have not yet 

been determined but it has been decided to use the same number of patients for each physical 

fitness group. Assume that a reasonable planning value for the error standard deviation is 

с = 4.5 days. 

a. What would be the required sample sizes if: (1) differences in the mean times for the three 
physical fitness categories are to be detected with probability .80 or more when the range 
of the treatment means is 5.63 days, and (2) the o risk is to be controlled at .01? 

b. If the sample sizes determined in part (a) were employed, what would be the power of the 
test for treatment mean differences when ш = 37, шә = 32, and из = 28? 

c. Suppose the chief objective is to identify the physical fitness group with the smallest mean 
required time for therapy. The probability should be at least .90 that the correct group is 
identified when the mean required time for the second best group differs by 2.0 days or 
more. What are the required sample sizes? 


Refer to Filling machines Problem 16.11. Suppose that the sample sizes have not yet been 
determined but it has been decided to sample the same number of cartons for each fill- 
ing machine. Assume that а reasonable planning value for the error standard deviation is 
o = .15 ounce. 


a. What would be the required sample sizes if: (1) differences in the mean amount of fill for 
the six filling machines are to be detected with probability .70 or more when the range of 
the treatment means is .15 ounce, and (2) the o risk is to be controlled at .05? 

b. For the sample sizes determined in part (a), what would be the power of the testif pı = .09, 
ҥ = .18, из = .30, ша = .20, u5 = .10, and ug = .20? 

с. Suppose the chief objective is to identify the filling machine with the smallest mean fill. 
The probability should be at least .95 that the filling machine with the smallest mean fill is 
recognized correctly when the filling machine with the next smallest mean fill differs by 
-10 ounce or more. What are the required sample sizes? 


Refer to Premium distribution Problem 16.12. Suppose that the sample sizes have not yet 
been determined but it has been decided to sample the same number of premium distributions 
for each agent. Assume that a reasonable planning value for the error standard deviation is 
o = 3.0 days. 


a. What would be the required sample sizes if: (1) differences in the mean time lapse for the 
five agents are to be detected with probability .95 or more when the range of the treatment 
means is 3.75 days, and (2) the o risk is to be controlled at .10? 

b. Suppose the chief objective is to identify the best agent, i.e., the one with the smallest mean 
time lapse. The probability should be at least .90 that the best agent is recognized correctly 
when the mean time lapse for the second best agent differs by 1.0 day or mòre. What are 
the required sample sizes? 


730 Part Four 


Design and Analysis of Single-Factor Studies 


Exercises 


16.36. 


16.37. 


16.38. 
16.39. 


16.40. 


16.41. 


(Calculus needed.) State the likelihood function for ANOVA model (16.2) when r = 3 and 

n; = 2 and obtain the maximum likelihood estimators. 

Show that when test statistic г in Table A.2a is squared. it is equivalent to the F* test statistic 

(16.55) for r = 2. 

Derive the restriction in (16.66) when the constant jz. is defined according to (16.65), 

a. Obtain the least squares estimators of the regression coefficients in full regression mode] 
(16.85). What is SSE( F) here? 

b. Obtain the least squares estimator of jz, in reduced regression model (16.86). What is 
SSE(R) here? 

A completely randomized experiment is to be conducted involving r = 3 treatments, with 

n —2 cxperimental trials for each treatment. Because the normality of the error terms is 

strongly in doubt, the test for treatment effects based on the F“ test statistic in (16.55) is io 

be carried out by means of the randomization distribution. 


a. Determine the number of ways that the six experimental units can be divided into three 
groups of size two. How many unique F* statistics are possible? 

b. Using the results in part (a). what is the smallest P-value that is possible with the random- 
ization test? What does this suggest about the adequacy of the planned sample size? 


(Calculus needed.) Given шу = 0. мз = 1. and 0 < u2 < 1, show that Уби; — n. is min 
imized when 4» == .5. where 4. = (у + [42 + H3)/3. 


Projects 


16.42. 


16.43. 


16.44. 


16.45. 


16.46. 


Refer to the SENIC data set in Appendix С.1. Test whether or not the mean infection risk 
(variable 4) is the same in the four geographic regions (variable 9); use о = .05. Assume that 
ANOVA model (16.2) is applicable. State the alternatives. decision rule, and conclusion. 
Refer to the SENIC data set in Appendix С.І. The effect of average age of patient (variable 3) 
on mean infection risk (variable 4) is to be studied. For purposes of this ANOVA study, average 
age is to be classified into four categories: Under 50.0, 50.0—54.9, 55.0—59.9. 60.0 and over. 
Assume that ANOVA model (16.2) is applicable. Test whether or not the mean infection risk 
differs for the four age groups. Control the o risk at .10. State the alternatives. decision rule, 
and conclusion. 

Refer to the CDI data set in Appendix C.2. The effect of geographic region (variable 17) on 
the crime rate (variable 10 — variable 5) is to be studied. Assume that ANOVA model (16.2) 
is applicable. Test whether or not the mean crime rates for the four geographjc regions differ; 
use o = .05. State the alternatives, decision rule. and conclusion. 

Refer to the Market share data set in Appendix C.3. Test whether or not the average monthly 
market share (variable 2) is the same for the four factor-level combinations associated with the 
two levels of each factor for discount price (variable 5) and package promotion (variable 6); 
use o' = .05. Assume that model (16.2) is applicable. State the alternatives, decision rule, and 
conclusion. 

Consider a test involving Hg: 4j = [15 = рз. Five observations are to be taken for each factor 
level. and level of significance a = .05 is to be employed in the test. 


a. Generate five random normal observations when z; = 100 and o = 12 to represent the 
observations for treatment 1. Repeat this for the other two treatments when 45 = H3 = 100 
and о = 12. Finally. calculate F* test statistic (16.55). 


b. Repeat part (a) 100 times. 


16.47. 


16.48. 


Chapter 16  Single-Factor Studies 731 


c. Calculate the mean of the 100 F* statistics. 

d. Whatproportionofthe F* statistics lead to conclusion Ho? Is this consistent with theoretical 
expectations? 

e. Repeat parts (а) and (b) when ш = 80, шо = 60, из = 160, and с = 12. Calculate the 
mean of the 100 F* statistics. How does this mean compare with the mean obtained 
in part (c) when ил = p2 = из = 100? Is this result consistent with the expectation 
in (16.37b)? 

f. What proportion of the 100 test statistics obtained in part (e) lead to conclusion Ha? Does 
it appear that the test has satisfactory power when ш, = 80, u2 = 60, and из = 160? 


A completely randomized experiment involving r — 2 treatments was carried out, based on 

n — 3 experimental trials for each treatment. The test for equality of the treatment means is 

to be carried out by means of the randomization distribution of the F* test statistic (16.55). 

a. Determine the number of ways that the six experimental units can be divided into two 
groups of size three each. How many unique F* statistics are possible? 

b. For the sample results: 


į: 1 2 3 
Y: 23 34 78 
Үз: 17 29 23 


obtain the randomization distribution of the test statistic F* and the P-value of the ran- 
domization test. 

с. Obtain the P-value ofthe normal-theory F* statistic for the sample results in part (b). How 
does this P-value compare with the one from the randomization test in part (b)? What does 
this suggest about the appropriateness of the F distribution here if the error terms are far 
from normally distributed? 

A completely randomized psychological reinforcement experiment was conducted in which 

a standard treatment and an experimental treatment were each applied to four subjects. The 

sample results are: 


j: 1 2 3 4 
Yı; (standard treatment): 16 14 18 16 
Y2; (experimental treatment): 12 15 13 12 


The test for equality of treatment means is to be carried out by means of the randomization 

distribution of the F* test statistic (16.55), with a = .10. 

a. Obtain the randomization distribution of the test statistic F* and carry out the indicated 
test. State the alternatives, decision rule, and conclusion. What is the P-value of the ran- 
domization test? 

b. For the randomization distribution in part (a), determine the proportion of F* values that 
exceed F (.90; 1, 6), the proportion of F* values that exceed F (.95; 1, 6), and the proportion 
that exceed F(.99; 1, 6). 

c. How do the proportions obtained in part (b) compare with the probabilities for the normal 
error model? Discuss. r 


732 PartFour Design and Analysis of Single-Factor Studies 


Case 
Studies 


16.49. 


16.50. 


16.51. 


Refer to the Prostate cancer data set in Appendix C.5. Carry out a one-way analysis of 
variance of this data set, where the response of interest is PSA level (variable 2) and the 
single factor is Gleason score (variable 9). The analysis should consider transformations of 
the response variable. Document steps taken in your analysis, and justify your conclus; bu: 


Refer to the Real estate sales data set in Appendix C.7. Carry out à one-way analysis of 
variance of this data set, where the response of interest is sales price (variable 2) and the single 
factor is number of bedrooms (variable 4). Recode the number of bedrooms into four cate. 
gories: 0—2, 3, 4, and greater than or equal to 5. The analysis should consider transformations 
of the résponse variable. Document steps taken in your analysis, and justify your conclusions 
Refer tothe Ischemic heart disease data set in Appendix C.9. Carry out a one-way analysis of 
variance of this data set, where the response of interest is total cost (variable 2) and the Single 
factor is total number of interventions (variable 5). Recode the number of interventions into 
six categories: 0, 1, 2, 3-4, 5—7, and greater than or equal to 8. The analysis should consider 
transformations of the response variable. Document steps taken in your analysis, and justify 
your conclusions. 


Chapter 


Analysis of Factor 
Level Means 


17.1 Introduction 


In Chapter 16, we discussed the F test for determining whether or not the factor level means 
Hi differ. This is a preliminary test to establish whether detailed analysis of the factor level 
means is warranted. When this test leads to the conclusion that the factor level means jj 
are equal, and ANOVA model (16.2) is appropriate, no relation between the factor and the 
response variable is present and usually no further analysis of factor means is therefore 
indicated. On the other hand, when the F test leads to the conclusion that the factor level 
means ш; differ, a relation between the factor and the response variable is present. In this 
latter case, a thorough analysis of the nature of the factor level means is usually undertaken. 
This is done in two principal ways: 


1. Analysis of the factor level means of interest using estimation techniques. 
2. Statistical tests concerning the factor level means of interest. 


Often, the analysis of factor level means combines the two approaches. For instance, a 
two-sided confidence interval may be constructed initially for an effect of interest. A test 
concerning this effect is then carried out either by determining whether or not the confidence 
interval contains the hypothesized value or by constructing the appropriate test statistic. 

When many related comparisons are to be made, testing often precedes estimation. This 
occurs, for instance, when each factor level effect is compared with every other one and 
the number of factor levels i$ not small. Here, statistical tests are often performed first to 
determine the active or statistically significant set of comparisons. Estimation techniques 
are then used to construct confidence intervals for the active comparisons. 

Special simultaneous estimation and testing procedures, called multiple comparison 
procedures, are required when a series of interval estimates or tests are performed. These 
multiple comparison procedures preserve the overall confidence coefficient 1 — o, or the 
overall significance level a, for the family of inferences. 

We first discuss three simple graphical methods for displaying the factor level means. 
Much of the remainder of the chapter is devoted to a consideration of important multiple 
comparison procedures. In Section 16.10 we introduced methods for determining sample 

733 


734 PartFour Design and Analysis of Single-Factor Studies 


TABLE 17.1 
Summary of 
Results— 
Kenton Food 
Company 
Example. 


Example 


Package Design (i) 
1 2 3 4 Total 
т 5 5 4 5 19 с 
Y;. 73 > 67 78 136 354 : 
Y;. 14.6 13.4 19.5 27.2 18.63 
Source of Variation SS df MS 
Between designs 588.22 3 196.07 
Error 158.20 15 10.55 
Total 746.42 18 
Package Design Characteristics 
1 3 colors, with cartoons 
2 3 colors, without cartoons 
3 5 colors, with cartoons 
4 5 colors, without cartoons 


sizes in single-factor studies based on the power approach. This chapter concludes with a 
discussion of the estimation approach to sample size planning. 

Throughout this chapter, we continue to assume the usual single-factor ANOVA model. 
The cell means version of this model was given in (16.2): 


Ү = ш + ey (17.1) 


where: 


a 


ш are parameters 
в are independent N (0, o?) 


Our discussion of the analysis of factor means will be illustrated by two examples. The 
first is the Kenton Food Company example. Data for this example are provided in Table 16.1 
on page 686, and the ANOVA table is displayed in Figure 16.5 on page 695. For convenience, 
we repeat the main results in Table 17.1. The second example, the rust inhibitor example, 
is described next. 


In a study of the effectiveness of different rust inhibitors, four brands (A, B, C, D) were 
tested. Altogether, 40 experimental units were randomly assigned to the four brands, with 
10 units assigned to each brand. A portion of the results after exposing the experimental 
units to severe weather conditions is given in coded form in Table 17.2a. The higher the 
coded value, the more effective is the rust inhibitor. This study is a completely randomized 
design, where the levels of the single factor correspond to the four rust inhibitor brands. 
The analysis of variance is shown in Table 17.2b. For level of significance o — 05 
for testing whether or not the four rust inhibitors differ in effectiveness, we require 


Chapter 17 Analysis of Factor Level Means 735 


(b) Analysis of Variance 


S игсе aot | 

"Variation SS C 
Between: brands 15,983.47 3- 
Error 221.03: 536 

"Total |. 16474.50 39 


Е(.95; 3, 36) = 2.87. Using the mean squares from Table 17.2b, we obtain the test statistic: 


МТК 5,317.82 
Е = —— = ———— = 866. 
MSE 6.140 еб 


Since F* = 866.1 > 2.87, we conclude that the four rust inhibitors differ in effectiveness. 
The P-value of the test is 0+. We therefore wish to analyze the nature of the factor level 
effects, particularly whether one rust inhibitor is substantially more effective than the others. 


17.2 Plots of Estimated Factor Level Means 


Line Plot 


Before undertaking formal analysis of the nature of the factor level effects, it is usually 
helpful to examine these factor effects informally from a plot of the estimated factor level 
means Ү;.. We shall take up three types of plots: (1) a line plot, (2) a bar graph, and (3) 


a main effects plot. All three plots are appropriate whether the sample sizes n; are equal 
or not, 


A line plot of the estimated factor level means simply shows the positions of the Y;. on a 
line scale. It is a very simple, but effective, device for indicating when one or several factor 
level means may differ substantially from the others. 


In Figure 17.1 we present a line plot of the estimated factor level means Y;. for the Kentorf 
Food Company example. It is clear from Figure 17.1 that design 4 led by far to the highest 


736 Part Four 


Design aud Analysis of Single-Factar Studies 


FIGURE 17.1 Line Plot of Estimated Factor Level Means—Kenton Food Company 
Example. 


Design Design Design Design 
2 1 3 4 
&—— Ф -9—— %— і 
20 30 
Cases Sold 


mean sales in the study, and that package designs | and 2 led to the smallest mean sale; 
which did not differ much from each other. The purpose of the formal inference procedures 
to be taken up shortly is to determine whether the pattern noted here reflects underlying 
differences in the factor level means ju; or is simply the result of random variation, 


Bar Graph and Main Effects Plot 


Example 


FIGURE 17.2 
MINITAB Bar 
Graph and 
Main Effects 
Plot of 
Estimated 
Factor Level 
Means— 
Kenton Food 
Company 
Example. 


Bar graphs and main effects plots are frequently used to display the estimated factor level 
means in two dimensions. Both can be used to compare the magnitudes of different factor 
level means. In a bar graph, vertical bars are used to display the estimated factor level 
means, In a main effects plot, a scatter plot of the estimated factor level means is provided, 
and the plot symbols are connected by straight lines, to visibly highlight potential trends 
in the cell means. Note that these trend lines are not particularly meaningful for qualitative 
factors. For this reason, main effects plots are most appropriate for quantitative factors. In 
some packages, the main effects plot also displays the overall mean using a horizontal line, 
permitting visual comparisons of the factor-level means with the overall mean. 


A bar graph and a main effects plot of the estimated factor level means for the Kenton Food 
Company example are displayed in Figure 17.2. Because package design is a qualitative 
factor, the bar graph in Figure 17.2a is the recommended graphic here. An advantage of 
the main effects plot in Figure 17.2b is that it permits a visual comparison of the estimated 
factor level means and the overall mean. Here it shows that designs 3 and 4 had higher mean 
sales than the overall mean, while designs І and 2 both had smaller means sales than the 


overall mean. 
+ 


(a) Bar Graph (b) Main Effects Plot 
30 
v 20 © Я 
© © 
2 v 
$ $ 
ЕЧ © 
UO 10 19) 
0 Т 
1 2 3 4 


Design 


Chapter 17 Analysis of Factor Level Means 737 


Comments 


1. In Section 16.7 we defined the difference of the factor level mean and the overall mean as the 
factor level effect. In our discussion of multifactor studies in Chapter 19 and beyond, we shall refer 
to factor level effects as main effects, For this reason, the plot in Figure 17.2b is frequently referred 
to as a main effects plot. 

2. None of the three plots provides information on the standard errors. Without such information, 
we cannot easily tell whether differences between factor level means are statistically significant. Later 
in this chapter, we shall enhance all three plots by including the information on the standard errors. 

3. The normal probability plot introduced in Chapter 3 can also be used to compare the estimated 
factor level means. A normal probability plot is appropriate when the sample sizes n; are equal and 
the number of factors r is sufficiently large. We recommend that a normal probability plot of factor 
level means be considered if r — 10. [| 


17.3 Estimation and Testing of Factor Level Means 


Inferences for factor level means are generally concerned with one or more of the following: 


1. A single factor level mean и; 

2. A difference between two factor level means 
3. A contrast among factor level means 

4. A linear combination of factor level means 


We discuss each of these types of inferences in turn. 


Inferences for Single Factor Level Mean 
Estimation. An unbiased point estimator of the factor level mean ju; is given in (16.16): 


fi = Yj. (17.2) 
This estimator has mean and variance: 
E(Yi.] = ш (17.3a) 
a о? 
c?^(Y,.) 2 — (17.3b) 


t 
The latter result follows because (16.43) indicates that Y;. = u; 4- j., the sum of a constant 
plus a mean of n; independent ¢;; error terms, each of which has variance c?. Further, Y;. is 
normally distributed because the error terms &;; are independent normal random variables. 
The estimated variance of Y ;. is denoted by s?(Y ;.) and is obtained as usual by replacing 

c? in (17.3b) by the unbiased point estimator MSE: 


= MSE 
wy dm — (17.4) 
i 
The estimated standard deviation s{Y;.} is the positive square root of (17.4). 
It can be shown that: X 
Y;.— 


LU — P is distributed as t(nz — r) for ANOVA model (17.1) (17.5) 


5{У;.} 


738 Part Four 


Example 


Design and Analysis of Single-Factor Studies 


where the degrees of freedom are those associated with MSE. The result (17.5) follows 
from the definition of / in (A.44) since: (1) Y;. is normally distributed and (2) MSEJo? ig 
distributed independently of Y;. as x (пу —r)/(nr —r) according to the following theorem: 


For ANOVA model (17.1), SSE/o? is distributed as x^ with np — r 


degrees of freedom, and is independent of ГОТ У,.. (17.6) 
It follows directly from (17.5) that the 1 — œ confidence limits for и, аге: 
Y, 470 = a[2ing — r)slY,.) (17.7) 


Testing. The confidence interval based on the limits in (17.7) can be used to test a hy- 
pothesis of the form: 
Ho: ui = 
Ha: bi AC 
where c is an appropriate constant. We conclude Ho, at level of significance о, when c is 


contained in the confidence interval, and we conclude H, when the confidence interval does 
not contain c. Equivalently, one can compute the test statistic: 


(17.8) 


Y. =e 
t = — (17.9) 
5{ Y;. } 
Test statistic г“ follows a ¢ distribution with лу — r degrees of freedom when Ho is true, 
according to (17.5). Consequently, we conclude Ho whenever |t*| € t(1 — 0/2; nr — r); 
otherwise. we conclude H,,. 


In the Kenton Food Company example, the sales manager wished to estimate mean sales for 
package design I with a 95 percent confidence interval. Using the results from Table 17.1, 
we have: 


Y.—146 2 =5 MSE= 10.55 
We require /(.975: 15) = 2.131. Finally. we need s[Y,.). We have: 


MSE 1055 


sY,.- 5 = 2.110 


ny а 


so that s{Y,.} = 1.453. Hence, we obtain the confidence limits 14.6 + 2.131(1.453) and 
the 95 percent confidence interval is: 


11.5 < pı < 17.7 


Thus, we estimate with confidence coefficient .95 that the mean sales per store for package 
design | are between 11.5 and 17.7 cases. 


Graphical Displays. Опе way to enhance a bar graph or the main effects plot of factor level 
means is to display the confidence limits in (17.7) for each factor level mean. Figure 17.3 
provides two such plots. Figure 17.3a contains a bar-interval graph, in which the 95 percent 
confidence limits are superimposed on a bar graph of the treatment means. Figure 17.3b 
contains an interval plot, in which the 95 percent confidence limits for each factor Ievel 


Chapter 17 Analysis of Factor Level Means 739 


FIGURE 17.3 (а) Bar-Interval Graph (b) Interval Plot 
ar-Interval 30 38 T 


Graph and 
Tuterval 

20 | 

MERE E _ » | 

1 2 3 4 


Plot—Kenton 
1 2 3 4 
Design Design 


Food Company 
Example. 


20 


Cases Sold 
Cases Sold 


mean are displayed. Many investigators prefer to simply display limits that correspond to 
plus-or-minus one standard error—that is, Y;. + s{Y;.}. 


Anferences for Difference between Two Factor Level Means 


Estimation. Frequently two treatments or factor levels are to be compared by estimating 
the difference D between the two factor level means, say, ш; and uy: 


Desinit (17.10) 


Such a difference between two factor level means is called a pairwise comparison. A point 
estimator of D in (17.10), denoted by D, is: 


Ô =Y; — Y». (17.11) 
This point estimator is unbiased: 
E{D} = ш = ш (17.12) 


Since Y;. and Y ,. are independent, the variance of D follows from (A.31b): 


К = = 1 1 
c?^(D) = c?(Y,.] +0Ү,.} = о? б + >) (17.13) 
t пр 
The estimated variance of D, denoted by s?{D}, is given by: 
Е 1 1 
s’{D} = MSE (z + z) (17.14) 
ni ny 


Finally, D is normally distributed by (A.40) because D is a linear combination of indepen- 
dent normal variables. 


It follows from these characteristics, theorem (17.6), and the definition of t in (A.44) 
that: 


b-D 
s{D} 


is distributed as (пт — r) for ANOVA model (17.1) (17.15) 


740 PartFour Design and Analysis of Single-lactor Studies 


Example 


Hence. the 1 — o confidence limits for D are: 


Db x t(1—of2;ny — r)stb) (17.16) 


Testing. There is often interest in testing whether two factor level means are the same 
The alternatives here are of the form: К 
Ho: ш = Hr 
17. 
Ha: i F Mr ( 17) 
The alternatives in (17.17) can be stated equivalently as follows: 
Ho: Hi = Ш = 0 
Ay: Hi — Hr * 0 


Conclusion Hg is reached at the œ level of significance if zero is contained within the 
confidence limits (17.16); otherwise, conclusion H, is reached. An equivalent procedure is 
based on the test statistic: 


(17.172) 


> 


t = — 
s{D} 


Conclusion Ho is reached if [г < /(1 — @/2; nr — r); otherwise, Ha is concluded. 


(17.18) 


For the Kenton Food Company example, package designs | and 2 used 3-color printing 
and designs 3 and 4 used 5-color printing, as shown in Table 17.1. We wish to estimate the 
difference in mean sales for 5-color designs 3 and 4 using a 95 percent confidence interval, 
That is, we wish to estimate D = из — иа. From Table 17.1, we have: 

Үз.= 19.5 m=4 МЕ = 10.55 

Y,-272 т = 5 
Непсе: 


Ô = Үз. — Үү. = 19.5 – 27.2 = —7.7 


The estimated variance of D is: 


PES I I I 1 

540} = MSE | — + — | = 10.55 | - +- | = 4.748 
пз Ha 4 5 

so that the estimated standard deviation of D is s{ Ô} = 2.179. We require t(.975; 15) = 

2.131. The confidence limits therefore are —7.7 +2.13 1 (2.179), and the desired 95 percent 

confidence interval is: 


—12.3 < р; — py < —3.1 


Thus, we estimate with confidence coefficient .95 that the mean sales for package design3 
fall short of those for package design 4 by somewhere between 3.1 and 12.3 cases per store. 

Note from Table 17.1 that the only difference between package designs 3 and 4 is the 
presence of cartoons; both designs used 5-color printing. The sales manager may therefore 
wish to test whether the addition of cartoons affects sales for 5-color designs. The alternatives 


i Chapter 17 Analysis of Factor Level Means 741 


here are: 


a й Ho: из — u4 = 0 
Ha: из — Ша * 0 
Since the hypothesized difference zero in Но is not contained within the 95 percent confi- 


dence limits —12.3 and —3.1, we conclude H,, that the presence of cartoons has an effect. 
We could also obtain test statistic (17.18): 


PS 


ENCORE 
~ s{D} 24179 


Since |t*| = 3.53 > £(.975; 15) = 2.131, we conclude H,. The two-sided P-value for this 
test is .003. 


3.53 


,nferences for Contrast of Factor Level Means 


A contrast is a comparison involving two or more factor level means and includes the 
previous case of a pairwise difference between two factor level means in (17.10). A contrast 
will be denoted by L, and is defined as a linear combination of the factor level means ju; 
where the coefficients c; sum to zero: 


r ғ 
1= Slam: зйее ус =0 (17.19) 


i=l ix] 


Illustrations of Contrasts. Inthe Kenton Food Company example, package designs 1 and 
2 used 3-color printing and designs 3 and 4 used 5-color printing, as shown in Table 17.1. 
Also, package designs 1 and 3 utilized cartoons while no cartoons were utilized in designs 
2 and 4. The following contrasts here may be of interest: 


1. Comparison of the mean sales for the two 3-color designs: 
L = ш = ио 
Here, су = 1, c2 = —1, c3 = 0, c4 = 0, and 37 c; = 0. 
2. Comparison of the mean sales for the 3-color and 5-color designs: 


EE c*tHM2 из + 
2 2 
Here, c; = 1/2, c2 = 1/2, сз = —1/2, c4 = —1/2, and У) c; = 0. 
3. Comparison of the mean sales for designs with and without cartoons: 


L 


_ Mitts по tua 
2 2 


Here, су = 1/2, c; = —1/2, сз = 1/2, cg = —1/2, and Y^ c; = 0. 
4. Comparison of the mean sales for design 1 with average sales for all four designs: 


L 


| Mat ua t us + uas 
4 T 
Here, сү = 3/4, c; = —1/4, єз = —1/4, cy = —1/4, and 33 c; = 0. 


L= ц 


J42 Part Four Design and Analysis of Single- Factor Studies 


Note that the first contrast is simply a pairwise comparison. 1п the second and third 
contrasts, averages of several factor level means are compared. The fourth contrast jg the 
factor effect ту defined by (16.60) and (16.63). 

The averages used here are unweighted averages of the means p; these аге ordinarily 
the averages of interest. In special cases one might be interested in weighted averages of the 
Шш to describe the mean response for a group of several factor levels. For example, if both 
3-color and 5-color designs were to be employed, with 3-color printing used three times as 
often as 5-color printing, the comparison of the effect of cartoons versus no cartoons might 
be based on the contrast: 


Зд\ + из 3и» + pa 
Е 4 4 
Here, c, = 3/4, c) = —3/4, сз = 1/4, c4 = — 1/4, and Y; c; = 0. 


L 


Estimation. An unbiased estimator of a contrast L is: 
Ê= Усу, (17.20) 
1=1 


Since the Y;. are independent, the variance of Ê according to (A.3]) is: 


r 


o?^(£) = Y ea = Уза (2) = eye (17.21) 


mn i=l ni 


1) = MSEX |“ (17.22) 


L is normally distributed by (A.40) because it is a linear combination of independent 
normal random variables. It can be shown by theorem (17.6), the characteristics of L just 
mentioned, and the definition of ¢ that: 


L-L 
SUL is distributed as f (nz — r) for ANOVA model (17.1) (17.23) 
M 
Consequently, the | — о confidence limits for L are: 


Ї+1(1 — 0/2; пт — rys(£] (17.24) 


Testing. The confidence interval based оп the limits in (17.24) can be used to test a 


hypothesis of the form: 
Ho: L =0 
? (17.25) 
Ha: L #0 


Ho is concluded at the o level of significance if zero is contained in the interval; otherwise 
H, is concluded. An equivalent procedure is based on the test staristic: 


jm (17.26) 


If || € t(1 —o/2; ny — г), Но is concluded; otherwise, Ha ts concluded. 


Chapter 17 Analysis of Factor Level Means 743 


In the Kenton Food Company example, the mean sales for the 3-color designs are to be 
compared to the mean sales for the 5-color designs with a 95 percent confidence interval. 
We wish to estimate: 


Hi + ua | из t Ua 
2 2 


The point estimate is (see data in Table 17.1): 


L= 


n <+ Y2. . +Y, i 4 . 27. 
p- int 2. Үз. + Үд, _ 14613 _ 19.5+ 72 93s 
2 2 2 2 


Since c, = 1/2, c; = 1/2, c4 = —1/2, and c, — —1/2, we obtain: 
2 1/2)? 1/2)? —1/2)? —1/2)* 
Уб „ЧЭ ae VE /2Y , 6 /2) 


= .2125 
5 5 4 5 


and: 
2 
s*(L) = MSE Y ^ = 10.55(2125) = 2.242 
nj 


so that s(£) = 1.50. 

For a 95 percent confidence interval, we require t(.975; 15) = 2.131. The confidence 
limits for L therefore are —9.35 + 2.131(1.50), and the desired 95 percent confidence 
interval is: 


—12.5 < L < —6.2 


Therefore, we conclude with confidence coefficient .95 that mean sales for the 3-color 
designs fall below those for the 5-color designs by somewhere between 6.2 and 12.5 cases 


per store. 
To test the hypothesis of no difference in mean sales for the 3-color and 5-color designs: 
H:L=0 
На: L #0 


at the о = .05 level of significance, we simply note that the hypothesized value zero is 

not contained in the 95 percent confidence interval. Hence, we conclude H,, that the mean 

sales differ. To obtain a P-value of the test, test statistic (17.26) must be obtained. We find: 
—9.35 


ї* = —— = —6.23 
1.50 е 


and the corresponding two-sided P-value is 0+. 


Comment 

Many single-factor analysis of variance programs permit the user to specify a contrast of interest and 

then will furnish the ¢* test statistic or the equivalent F* test statistic. ш 
Inferences for Linear Combination of Factor Level Means m 


Occasionally, we are interested in a linear combination of the factor level means that is not 
a contrast. For example, suppose that the Kenton Food Company will use all four package 
designs, one in each of its four major marketing regions, and that these marketing regions 


744 Part Four Design and Analysis of Single-Factor Studies 


account for 35, 28, 12, and 25 percent of sales, respectively. In that case, there might be 
interest in the overall mean sales per store for all regions: 


L = 35g, + .28p3 + 123 + 2554 


Note that this linear combination is of the form L — x сш but that the coefficients с, sum 
to 1.0, not to zero as they must for a contrast. 
We define a linear combination of the factor level means p, as: 


L-Y qm (17.27) 
EI 


with no restrictions on the coefficients c;. Confidence limits and test statistics for a linear 
combination L are obtained in exactly the same way as those for a contrast by means 
of (17.24) and (17.26), respectively. Point estimator (17.20) and estimated variance (17.22) 
are still applicable when >> c; Æ 0. 


Single Degree of Freedom Tests. The alternatives for tests concerning a factor level mean 
in (17.8), a difference between two factor level means in (17.172), and a contrast of factor 
level means in (17.25) are all special cases of a test concerning a linear combination of 


factor level means: 
Ho: Уу сшщ =с 


Ha: Уаш #с 
where the c; and c are appropriate constants. Test statistics (17.9), (17.18), and (17.26) can 
each be converted to an equivalent F* test statistic by means of the relation in (A.50ay: 
Е* = ty 


Test statistic F* follows the F (1, пу —r) distribution when Ho holds. Note that the numerator 
degrees of freedom are always one. Hence, these tests are often referred to as single-degree- 
of-freedom tests. 'The t* version of the test statistic is more versatile because it can also be 
used for one-sided tests while the F* version cannot. 


17.4 Need for Simultaneous Inference Procedures 


The procedures for estimating and testing factor level means discussed up to this point have 
two important limitations: 


I. The confidence coefficient | — о for the estimation procedures described is a statement 
confidence coefficient and applies only to a particular estimate, not to a series of estimates. 
Similarly, the specified Type I error rate, œ, applies only to a particular test and not to a 
series of tests. 

2. The confidence coefficient 1 — о and the specified significance level œ are appropriate 
only if the estimate or test was not suggested by the data. 


The first limitation is familiar from regression analysis. It is particularly serious for 
analysis of variance models because frequently many different comparisons are of interest 


Chapter 17 Analysis of Factor Level Means 745 


here, and one needs to piece the different findings together. Consider the very simple 
case where three different advertisements are being compared for their effectiveness in 
stimulating sales. The following estimates of their comparative effectiveness have been 
obtained, each with a 95 percent statement confidence coefficient: 


59 < u2 — ца < 62 
—2 < 3- ш < 3 
58 < u5 — из < 64 


It would be natural here to piece the different comparisons together and conclude that 
advertisement 2 leads to highest mean sales, while advertisements 1 and 3 are substantially 
less effective and do not differ much among themselves. One would therefore like a family 
confidence coefficient for this family of statements, to provide known assurance that the set 
of conclusions is correct. 

The same concern for assurance of correct conclusions exists when the inferences involve 
tests. An analysis of factor means by testing procedures usually involves several single- 
degree-of-freedom tests to answer related questions. For instance, the sales manager of the 
Kenton Food Company might wish to know both whether the number of colors has an effect 
on mean sales and whether the use of cartoons has an effect. Whenever several tests are 
conducted, both the level of significance and the power, insofar as the family of tests is 
concerned, are affected. Consider, for example, three different t tests, each conducted with 
a = .05. The probability that each of the tests will lead to conclusion Но when indeed Ho is 
correct in each case, assuming independence of the tests, is (.95)? — .857. Thus, the level 
of significance that at least one of the three tests leads to conclusion H, when Ho holds in 
each case would be 1 — .857 — .143, not .05. We see then that the level of significance 
and power for a family of tests is not the same as that for an individual test. Actually, the t* 
statistics are dependent when they all are based on the same sample data and use the same 
MSE value. It is often therefore more difficult to determine the actual level of significance 
and power for a family of tests. 

The second limitation of the procedures for estimating or testing factor level means 
discussed so far, namely, that the estimate or test must not be suggested by the data, is an 
important one in exploratory investigations where many new questions are often suggested 
once the data are being analyzed. The process of studying effects suggested by the data is 
sometimes called data snooping. One form of data snooping is to investigate comparisons 
where the effect appears to be large from the sample data, for example, testing whether 
there is a difference between the two treatment means corresponding to the smallest and 
largest estimated factor level means Ү,.. Choosing the test in this manner implies a larger 
significance level than the nominal level used in constructing the decision rule. For example, 
it can be shown for a study with six factor levels that if the analyst will always compare 
the smallest and largest estimated factor level means by using the confidence limits (17.16) 
with a 95 percent confidence coefficient, the interval estimate will not contain zero and 
therefore suggest a real effect 40 percent of the time when indeed there is no difference 
between any of the factor level means (Ref. 17.1). Hence, the o level for the test 15 .40, not 
.05. With a larger number of factor levels, the likelihood of an erroneous indication of a real 
effect, 1.е., the actual o level, would be even greater. The reason for the higher actual level 
of significance here is that a family of tests is being conducted implicitly since the analyst 


746 PartFour Design and Analysis of Single-Factor Snudies 


does not know in advance which estimated factor level means will be the extreme ones, The 
situation here is analogous to that in Chapter 10 where the test to determine whether the 
largest absolute residual is an outlier considers the family of tests for each of the n residuals 

One solution to this problem of making comparisons that are suggested by initial analysis 
of the data is to use a multiple comparison procedure where the family of inferences includes 
all the possible inferences that can be anticipated to be of potential interest after the data 
are examined. For instance, in an investigation where five factor level means are being 
studied, it is decided in advance that principal interest is in three pairwise comparisons, 
However, it is also agreed that other pairwise comparisons that will appear interesting 
should be studied as well. In this case, the family of all pairwise comparisons can be used 
as the basis for obtaining an appropriate family confidence coefficient or significance leve] 
for the comparisons suggested by the data. 

In the next three sections, we shall discuss three multiple comparison procedures for 
analysis of variance models that permit the family confidence coefficient and the family o 
risk to be controlled. Two of these procedures, the Tukey and Scheffé procedures, allow 
data snooping to be undertaken naturally without affecting the confidence coefficient or 
significance level. The other procedure, the Bonferroni procedure, is applicable only when 
the effects to be investigated are identified in advance of the study. 


17.5 Tukey Multiple Comparison Procedure 


The Tukey multiple comparison procedure that we will consider here applies when: 


The family of interest is the set of all pairwise comparisons of factor level means; in 
other words, the family consists of estimates of all pairs D = u; — p; or of all tests of 
the form: 


Ho: ш — Шш = 0 
Ha: Hi = Ш FO 


When all sample sizes are equal, the family confidence coefficient for the Tukey method is 
exactly 1 — о and the family significance level is exactly œ. When the sample sizes are not 
equal, the family confidence coefficient is greater than | — о and the family significance 
level is less than о. In other words, the Tukey procedure is conservative when the sample 
sizes are not equal. 


Studentized Range Distribution 
The Tukey procedure utilizes the stuclentized range distribution. Suppose that we have r 
independent observations Y,,..., Y, from a normal distribution with mean jz and variance 
o°. Let w be the range for this set of observations; thus: 


w = max(Yi) — min(Y;) (17.28) 


Suppose further that we have an estimate s? of the variance c? which is based on v degrees 
of freedom and is independent of the Y;. Then, the табо w/s is called the stucentized range. 
It is denoted by: 


q(r, v) = = (17.29) 


5 


Chapter 17 Analysis of Factor Level Means 747 


where the arguments in parentheses remind us that the distribution of g depends on к and v. 
The distribution of g has been tabulated, and selected percentiles are presented in Table B.9. 
This table is simple to use. Suppose that r — 5 and v — 10. The 95th percentile is then 
q(.95; 5, 10) = 4.65, which means: 
ZEE = q(5.10) < 4.65} = .95 
$ 
Thus, with five normal Y observations, the probability is .95 that their range is not more 


than 4.65 times as great as an independent sample standard deviation based on 10 degrees 
of freedom. 


Simultaneous Estimation 
The Tukey multiple comparison confidence limits for all pairwise comparisons D = pi — ир 
with family confidence coefficient of at least 1 — о are as follows: 


D+Ts{D} (17.30) 
where: 
ђ = Y;. — Yr. (17.30a) 
A = = 1 
s2(D) = s2(Y;.] T s2(Yg.] = MSE (2 + =) (1 7.30b) 
l " 
T= 7344 — a;r, пт —r) (17.30с) 


Note that the point estimator D in (17.30а) and the estimated variance in (17.30b) are 
the same as those in (17.11) and (17.14) for a single pairwise comparison. Thus, the only 
difference between the Tukey confidence limits (17.30) for simultaneous comparisons and 
those in (17.16) for a single comparison is the multiple of the estimated standard deviation. 

The family confidence coefficient 1 — o pertaining to the multiple pairwise comparisons 
refers to the proportion of correct families, each consisting of all pairwise comparisons, when 
repeated sets of samples are selected and all pairwise confidence intervals are calculated 
each time. А family of pairwise comparisons is considered to be correct if every pairwise 
comparison in the family is correct. Thus, a family confidence coefficient of 1 — o indicates 
that all pairwise comparisons in the family will be correct in (1 — 0)100 percent of the 
repetitions. 


Simultaneous Testing 
When we wish to conduct a family of tests of the form: 


Ho: ui — uy = 0 
Hs: ui — up FO 


for all pairwise comparisons, the family of confidence intervals based on (17.30) may be 
utilized for this purpose. We simply determine for each interval whether or not zero is 
contained in the interval. If zero is contained, conclusion Ho is reached; otherwise, Н„ is 
concluded. By following this procedure, the family level of significance will not exceed о. 


(17.31) 


748 Part Four Design and Analysis of Single-Factor Studies 


FIGURE 17.4 
Paired 
Comparison 
Plot—Rust S p 
Inhibitor Bc hed 
Example. = x 

2 

© DH red 

40 50 60 70 80 90 


Performance Score 


Equivalently, the pairwise tests can be conducted directly by calculating foreach pairwise 
comparison the test statistic: 


„20 
4 7 Бу 


(17.32) 


where D апа s?{D} are given in (17.30). Conclusion Hp in (17.31) is reached if la| < 
q(1 — 0; r; пт — r); otherwise, H, is concluded. 

A paired comparison plot provides still another means of conducting all pairwise tests 
with the Tukey procedure when all sample sizes are equal, i.e., when n; = n. This plot 
provides a graphic means of making all pairwise comparisons. Around each estimated 
treatment mean Y;. is plotted an interval whose limits are: 


Y. + jTstb) (17.33) 


When the intervals overlap on this plot, the formal test leads to the conclusion that the two 
treatment means do not differ. When the intervals do not overlap, the formal test leads to 
the conclusion that the two treatment means differ. In addition, the paired comparison plot 
shows the direction of the difference. 

Figure 17.4 provides an illustration of a paired comparison plot for the rust inhibitor 
example. There is no overlap between the intervals for rust inhibitors B and C indicating 
that the mean performances differ for these two rust inhibitors. Figure 17.4 in addition 
shows that rust inhibitor B is superior to C since its interval is considerably to the right of 
that for C, thus providing directional information about the difference in mean performance 
for the two rust inhibitors. We discuss this plot in greater detail on page 750. 


Example 1—Equal Sample Sizes 
Intherustinhibitorexample in Table 17.2, it was desired to estimate all pairwise comparisons 
by means of the Tukey procedure, using a family confidence coefficient of 95 percent. Since 
r = 4and n7 —r = 36, we find the required percentile of the studentized range distribution 
from Table B.9 to be 4(.95; 4, 36) = 3.814. Hence, by (17.30c), we obtain: 


1 
5 


Т = — (3.814) = 2.70 


Chapter 17 Analysis of Factor Level Means 749 


TABLE 17.3 Simultaneous Confidence Intervals and Tests for Pairwise 
Differences Using the Tukey Procedure—Rust Inhibitor Example. 


Fe x Test 

Confidence Interval Ho Ha gq 
43.3 < u2 — p < 49.3 H2 = Ш B2 pa 58.99 
21.8 € u3 =m < 27.8 из = pai Из Zim 31.61 
—3 < m Ha < 57 Ил = H4 Ил F Ma 3.40 
18.5 < H2 —HBa < 24.5 H2 = Ha H2 # Ha 27.37 
46.0 < рә — pas 52.0 H2 = Ил H2 Y Ша 62.39 
24.5 < pua — ра < 30.5 Из = 4 Ha # Ha 35.01 


3 
Further, we need s{D}. Using (17.306), we find for any pairwise comparison since equal 
sample sizes were employed: 


^ 1 1 1 1 
2 — MSE — — — 1 << ud = 1.23 
s^(D) a a 6.140 10 * 10 


so that s(D) — 1.11. Hence, we obtain for each pairwise comparison: 
Ts{D} = 2.70(1.11) = 3.0 


To illustrate the calculation of the pairwise confidence limits, consider the estimation of 
the difference between the treatment means for rust inhibitors A and B, u2 — ил: 


D = Үз. — Y,. = 89.44 — 43.14 = 46.3 
The confidence limits from (17.30) therefore are 46.3 + 3.0 and the confidence interval is: 


The complete family of pairwise confidence intervals is listed in the left column of 
Table 17.3. The pairwise comparisons indicate that all but one of the differences (D and A) 
are statistically significant (confidence interval does not cover zero). 

We incorporate this information in a line plot of the estimated factor level means by 
underlining nonsignificant comparisons. 


DA С B 
— 39] е —————Ó——————$— —— 
40 60 80 


Performance Score 


The line between D and A indicates that there is no clear evidence whether D or A is the 
better rust inhibitor. The absence of a line signifies that a difference in performance has 
been found and the location of the points indicates the direction of the difference. Thus, 
the multiple comparison procedure permits us to infer with a 95 percent family confidence 
coefficient for the chain of conclusions that B is the best inhibitor (better by somewhere 
between 18.5 and 24.5 units than the second best), C is second best, and A and D follow 
substantially behind with little or no difference between them. 


750 PartFour Design and Analysis of Single- Factor Studies 


The same conclusions are obtained if we carry out all pairwise tests using the simulta 
neous testing procedure based on test statistic (17.32). For example, to test: 
Ho: fa — pa = 0 
Ha: Ha — pa x0 


we require the test statistic: 


* 


| A/2(89.44 — 43.14) 


= 58.99 
1.11 


Because |g*| = 58.99 > g(.95; 4, 36) = 3.814, we conclude H,, that the two treatment 
means differ. The test statistics g* for the family of all pairwise tests are listed in the right 
column of Table 17.3. The absolute values of all test statistics exceed 3.814 except for One, 
so that all differences are found to be statistically significant except for that involving р, 
and u4 (A and D). For this case, |g*| = 3.40 does not exceed the critical value 3.814, 

Figure 17.4 presents a paired comparison plot for the rust inhibitor example. Here are 
plotted the estimated treatment means Y ı With the comparison intervals based on (17.33). 
For example, for rust inhibitor A, we have from earlier: 


Y,.=43.14 7-270 s{D=1.11 


so that the comparison limits in (17.33) are: 


43.14 £ 20.70.11) or 41.64 and 44.64 


We readily see that only the intervals for A and D overlap, that rust inhibitor B is clearly best, 
that rust inhibitor C is second best, and that rust inhibitors A and D are the least effective. 


Example 2—Unequal Sample Sizes 
In the Kenton Food Company example in Table 17.1, the sales manager was interested in the 
comparative performance of the four package designs. The analyst developed all pairwise 
comparisons by means of the Tukey procedure with a family confidence coefficient of at 
least 90 percent. Since the sample sizes are not equal here, the estimated standard devíation 
s{D} must be recalculated for each pairwise comparison. To compare designs 1 and 2, for 
instance, we obtain: 


Ё == Үү. — Үз. = 14.6 — 13.4 = 1.2 


^ I 1 1 I 
{Бү = MSE | — + z) == 10.55 (s + 5) = 4.22 


nı пә 5 
s(D) = 2.05 
For a 90 percent family confidence coefficient, we require q(.90; 4, 15) — 3.54 so that we 
obtain: 
1 


T = —=(3.54) = 2.50 
va! ) 


EE 


Chapter 17 Analysis of Factor Level Means 751 


Hence, the confidence limits are 1.2 + 2.50(2.05) and the confidence interval for ил — шо is: 
—39 < ш — u2 < 63 
In the same way, we obtain the other five confidence intervals: 
—.6 = (19.5 — 14.6) — 2.50(2.18) < из — ш < (19.5 — 14.6) + 2.50(2.18) = 10.4 
7.5 = (27.2 — 14.6) — 2.50(2.05) < u4 — ш < (27.2 — 14.6) + 2.50(2.05) = 17.7 
7 = (19.5 — 13.4) — 2.50(2.18) < u3 — u2 < (19.5 — 13.4) + 2.50(2.18) = 11.6 
8.7 = (27.2 — 13.4) — 2.50(2.05) < u4 — u2 < (27.2 — 13.4) + 2.50(2.05) = 18.9 
2.3 = (27.2 — 19.5) — 2.50(2.18) < u4 — из < (27.2 — 19.5) + 2.50(2.18) = 13.2 


We summarize the comparative performance by a line plot, indicating each nonsignificant 
difference by a rule. 


Design Design Design Design 
2 1 3 4 
— ЧЕГ РИНЕ — НР 
10 e220 30 
Cases Sold 


We can conclude with at least 90 percent family confidence that design 4 is clearly the 
most effective design. However, the small-scale study does not permit a complete ordering 
among the other three designs. Design 3 is more effective than design 2 but may not be 
more effective than design 1, which in turn may not be more effective than design 2. 

Often, the results of the family of pairwise tests are summarized by setting up groups of 
factor levels whose means do not differ according to the single degree of freedom tests. For 
the Kenton Food Company example, there are three such groups: 


Group 1 Group 2 Group 3 


Design 4 Y, —272  Design3 Уз. =19.5 Design? Yı. = 14.6 
Design? — Yi.—146 Design2 Y2 = 13.4 


Comments 


1. When the Tukey procedure is used with unequal sample sizes, it is sometimes called the Tukey- 
Kramer procedure. 

2. When not all pairwise comparisons are of interest, the confidence coefficient for the family of 
comparisons under consideration will be greater than the specification 1 — o used in setting up the 
Tukey intervals. Similarly, the family significance level for simultaneous testing will be less than о. 

3. The Tukey procedure can be used for data snooping as long as the effects to be studied on the 
basis of preliminary data analysis are pairwise comparisons. 

4. The Tukey procedure can be modified to handle general contrasts of factor level means.,We do 
not discuss this modification since the Scheffé method (to be discussed next) is to be preferred for 
this situation. 


752 Part Four 


Design and Analysts of Single-Factor Studies 


5. To derive the Tukey stmultaneous confidence intervals for the case when all s 


А . 2 ample Sizes are 
equal, i.e., when A; = A so that ny = rn, consider the deviations: 


(Yi. — pi. -... (Yr — Hr) (17.34 


and assume that ANOVA model (17.1) applies. The deviations in (17.34) are then independent vari 
ables (because the error terms are independent), they are normally distributed (because the error terms 
are independent normal variables), they have the same expectation zero (because н; is subtracted from 


Y;.), and they have the same variance o7/n. Further. MSE/n is an estimator of o? /n that is inde- 


pendent of the deviations (Y,. — ш) per theorem (17.6). Thus, it follows from the definition Of the 
studentized range g in (17.29) that: 


тах(У;. — Hi)— min(Y;. — ш) 


~qr. пт —r) 173 
MSE ( 5) 
п 


where лт — ғ is the number of degrees of freedom associated with MSE, max(Y ;. — и) is the largest 
deviation, and min(Y;. — 4) is the smallest deviation. 
In view of (17.35), we can write the following probability statement: 


max(Y;. — Hi) — min(Y;. — ш) 


MSE 
| п 


Note now that the following inequality holds for aff pairs of factor levels i and 7’: 


Sgil -ar.n —r)p-l—a (17.36) 


\(¥;. — Hi)— (Ys = Ap) x max(Y;. — ш) – min(Y;. — Шш) (17.37) 


The absolute value at the left is needed since the factor levels i and /' are not ordered so that we may 
be subtracting the larger deviation from the smaller. To put this another way, we are merely concerned 
here with the difference between the two factor level deviations regardless of direction. 

Since inequality (17.37) holds for all pairs of factor levels i andi’, it follows from (17.36) that the 
probability: 


Pes Yi eu 
р |н UR рл жу Siow (17.38) 


| MSE 
V a 
* 


holds for all r(r — 1)/2 pairwise comparisons among the r factor levels. By rearranging the inequality 
in (17.38), using the definitions of s?{D} in (17.30b) and of T in (17.30c), and noting that for the 
equal sample size case s? [D] becomes: 


when л, = п 


п п 


р I o1 2MSE 
51D] = MSE + = 
п 

we obtain the Tukey multiple comparison confidence limits in (17.30). 

6. When the Tukey multiple comparison procedure is used for testing pairwise differences as 
in (17.31), the tests are sometimes called honestly significant difference tests. 

7. The pairwise comparison plot can be used as an approximate plot when the sample sizes are 
not equal, provided that the sample sizes do not differ greatly. For this case. the comparison limits 


Chapter 17 Analysis of Factor Level Means 753 


should be obtained as follows: 
= 1 = 
Y;. + 540 — Os r, Пт — r)s{Y;-} (1 7.39) 


The limits in (17.39) are identical to those in (17.33) when the sample sizes are equal. ш 


Scheffé Multiple Comparison Procedure 


2 The Scheffé multiple comparison procedure was encountered previously for regression 
models. It is also applicable for analysis of variance models. It applies for analysis of 
variance models when: 


The family of interest is the set of all possible contrasts among the factor level means: 


Ls Уаш where Soa =0 (17.40) 
In other words. the family consists of estimates of all possible contrasts L or of tests 
concerning all possible contrasts of the form: 
| Н: L =0 
На: L #0 
Thus, infinitely many statements belong to this family. The family confidence level for the 


Scheffé procedure is exactly 1 — о, and the family significance level is exactly a, whether 
the factor level sample sizes are equal or unequal. 


Simultaneous Estimation 
We noted earlier that an unbiased estimator of L is: 


L-Y'aY. (17.41) 
for which the estimated variance is: 
2 
542) = MSEX 2» (17.42) 
The Scheffé confidence intervals for the family of contrasts L are of the form: 
L+Ss{L} (17.43) 
where: 
S? = (r —1)К(1—ое;к—1,пт—") (17.4За) 


and £ and s(£) are given by (17.41) and (17.42), respectively. If we were to calculate the 
confidence intervals in (17.43) for all conceivable contrasts, then in (1 — œ)100 percent of 
repetitions of the experiment, the entire set of confidence intervals in the family would be 
correct. 

Note that the simultaneous confidence limits in (17.43) differ from those for a single 
confidence limit in (17.24) only with respect to the multiple of the estimated standard 
deviation. .' 


754 PartFour Design and Analysis of Single-Factor Studies 


Simultaneous Testing 
Tests involving contrasts of the form: 


Hog: L —0 


н #0 (17.44) 


can be carried out by examination of the corresponding Scheffé confidence intervals based 
on (17.43). Ho is concluded at the o family level of significance if the confidence interval 
includes zero; otherwise H, is concluded. An equivalent direct testing procedure for the 


alternatives in (17.44) uses the test statistic: 
[2 
"eco (ле 


Conclusion Ну іп (17.44) is reached at the œ family significance level if F* < 
Е(1—о;к— 1, пт — r); otherwise, H, is concluded. 


In the Kenton Food Company example, interest centered on estimating the following four 


Example 
кыныр... contrasts with family confidence coefficient .90: 


Comparison of 3-color and 5-color designs: 


ш Tua _ из tua 
2 2 


Comparison of designs with and without cartoons: 


Ly 


po Maths ua pa 
= HEM _ iaa 


2 2 
Comparison of the two 3-color designs: 
Їз = ш — ua 
Comparison of the two 5-color designs: 
L4 = ua — ua 


Consider first the estimation of L,. Earlier, we found: 


Êi = —9.35 
s(£,) — 1.50 


Since r — 1 = З and n, — r = 15 (Table 17.1), we have: 
S? = (r - DF(0 — a;r – 1, пу — r) = 3F(.90;3, 15) = 3(2.49) = 7.47 


so that S — 2.73. Hence, the 90 percent confidence limits for L, by the Scheffé multiple 
comparison procedure are —9.35 + 2.73(1.50) and the desired confidence interval is: 


—13.4 < Lı < —5.3 


Chapter 17 Analysis of Factor Level Means 755 


pe PM 


i In similar fashion, we obtain the other desired confidence intervals, and the entire set is: 


—13.4 < Lı < —5.3 
—7.3 < 15 < .8 
—4.4 < Із < 6.8 
—13.7 < L, < —1.7 


Note that the confidence interval for L, does not include zero. Hence, if we wished 
to test Ho: Lı = 0 versus H4: Lı 5 0, we would conclude H,, that the mean sales for 
3-color and 5-color designs differ. The confidence interval provides additional information, 
however; namely, that mean sales for 5-color designs exceed mean sales for 3-color designs, 
by somewhere between 5.3 and 13.4 cases per store. 

Any chain of conclusions derived from the set of confidence intervals has associated with 
it family confidence coefficient .90. The principal conclusions drawn by the sales manager 
were as follows: 5-color designs lead to higher mean sales than 3-color designs, the increase 
being somewhere between 5 and 13 cases per store. No overall effect of cartoons in the 
package design is indicated, although the use of a cartoon in 5-color designs leads to lower 
mean sales than when no cartoon is used. 


Comments 

1. If in the Kenton Food Company example we had wished to estimate a single contrast with 
statement confidence coefficient .90, the required t value would have been 1(.95; 15) = 1.753. This 
t value is smaller than the Scheffé multiple S — 2.73, so that the single confidence interval would be 
somewhat narrower. The increased width of the interval with the Scheffé procedure is the price paid 
for a known confidence coefficient for a family of statements and a chain of conclusions drawn from 
them, and for the possibility of making comparisons not specified in advance of the data analysis. 

2. Since applications of the Scheffé procedure never involve all conceivable contrasts, the confi- 
dence coefficient for the finite family of statements actually considered will be greater than 1 — o so 
that 1 — o serves as a guaranteed lower bound. Similarly, the significance level for the finite family of 
tests considered will be less than o. For this reason, it has been suggested that lower confidence levels 
and higher significance levels be used with the Scheffé procedure than would ordinarily be employed. 
Confidence coefficients of 90 percent and 95 percent and significance levels of o, = .10 and a = .05 
with the Scheffé procedure are frequently mentioned. 

3. The Scheffé procedure can be used for a wide variety of data snooping since the family of 
statements contains all possible contrasts. L| 


Comparison of Scheffé and Tukey Procedures 


1. If only pairwise comparisons are to be made, the Tukey procedure gives narrower 
confidence limits and is therefore the preferred method. 

2. The Scheffé procedure has the property that if the F test of factor level equality 
indicates that the factor level means u; are not equal, the corresponding Scheffé multiple 
comparison procedure will find at least one contrast (out of all possible contrasts) that differs 
significantly from zero (the confidence interval does not cover zero). It may be, though, that 
this contrast is not one of those that has been estimated. 


756 PartFour Design and Analysis of Single-Factor Studies 


17.7 Bonferroni Multiple Comparison Procedure 


The Bonferroni multiple comparison procedure was encountered earlier for regression mod. 
els. It is also applicable for analysis of variance models when: 


The family of interest is a particular set of pairwise comparisons, contrasts, or linear 
combinations that is specified by the user in advance of the data analysis. 


The Bonferroni procedure is applicable whether the factor level sample sizes are equal or 
unequal and whether inferences center on pairwise comparisons, contrasts, linear combi- 
nations, or a mixture of these. 
Simultaneous Estimation 
We shall denote the number of statements in the family by g and treat them all as linear 
combinations since pairwise comparisons and contrasts are special cases of linear combina- 
tions. The Bonferroni inequality (4.4) then implies that the confidence coefficient is at least 
| — a that the following confidence limits for the g linear combinations £L are all correct: 
Ї + Bs(£) (17.46) 

where: 

B —t(l—a/2g;ny —r) (17.462) 


Simultaneous Testing 
When we wish to conduct a series of tests of the form: 
Ho: L=0 
Ба: L #0 


we can use either the confidence intervals based on (17.46) or the test statistics: 


^ 


L 
\ t= = (17.47) 
s{L} 
If |t*| < t(1 — 0/22; пт — r), we conclude Ho; otherwise, H, is concluded. 
The sales manager of the Kenton Food Company is interested in estimating the following 


Example : : |, 
——————————- two contrasts with family confidence coefficient .975: 
Comparison of 3-color and 5-color designs: 


1 =! c*u2 tua 
‚= - 

2 2 
Comparison of designs with and without cartoons: 


_ Mitus  Hadua 


L 
2 2 2 


Earlier we found: 
£,—-935  s(h)-150 
£,——325  s{Î2} = 1.50 


Chapter 17 Analysis of Factor Level Means 757 


For a 97.5 percent family confidence coefficient with the Bonferroni method, we require: 
| В = t[1 — .025/2(2); 15] = 1£(.99375; 15) = 2.84 


We can now complete the confidence intervals for the two contrasts. For L1, we have 
confidence limits —9.35 + 2.84(1.50), which lead to the confidence interval: 


—13.6 < Lı < —5.1 
i Similarly, we obtain the other confidence interval: 
—7.5 < L2 < 1.0 


These confidence intervals have a guaranteed family confidence coefficient of 97.5 percent, 
i which means that in at least 97.5 percent of repetitions of the experiment, both intervals 
will be correct. 

Again, we would conclude from this family of estimates that mean sales for 5-color 
designs are higher than those for 3-color designs (by somewhere between 5 and 14 cases 
per store), and that no overall effect of cartoons in the package design is indicated. 

The Scheffé multiple for a 97.5 percent family confidence coefficient in this case would 
have been: 


S? — 3F(.975; 3, 15) — 3(4.15) — 12.45 


or S = 3.53, as compared to the Bonferroni multiple В = 2.84. Thus, the Scheffé procedure 
here would have led to wider confidence intervals than the Bonferroni procedure. 


Comment 

Itis not necessary that all comparisons be estimated with statement confidence coefficients 1 — a/g for 
the Bonferroni family confidence coefficient to be 1 — o. Different statement confidence coefficients 
may be used, depending upon the importance of each statement, provided that a, + oz + - - - Боз =a. 


Comparison of Bonferroni Procedure with Scheffé and Tukey Procedures 

1. If all pairwise comparisons are of interest, the Tukey procedure is superior to the 
Bonferroni procedure, leading to narrower confidence intervals. If not all pairwise compar- 
isons are to be considered, the Bonferroni procedure may be the better one at times. 

2. 'The Bonferroni procedure will be better than the Scheffé procedure when the number 
of contrasts of interest is about the same as the number of factor levels, or less. Indeed, the 
number of contrasts of interest must exceed the number of factor levels by a considerable 
amount before the Scheffé procedure becomes better. 

3. Allthreeprocedures are of the form "estimator + multiplier x SE.” The only difference 
among the three procedures is the multiplier. In any given problem, one may compute the 
Bonferroni multiple as well as the Scheffé multiple and, when appropriate, the Tukey 
multiple, and select the one that is smallest. This choice is proper since 1t does not depend 
on the observed data. x 

4. 'The Bonferroni multiple comparison procedure does not lend itself to data snooping 
unless one can specify in advance the family of inferences in which one may be interested 


758 Part Four 


Design aud Analysts of Siugle-Lactor Sutdies 


and provided this family is not large. On the other hand. the Tukey and Scheffé Procedures 
involve familics of inferences that lend themselves naturally to data snooping, 

5. Other specialized multiple comparison proccdurcs have been developed. For example 
Dunnett's procedure (Ref. 17.2) performs pairwise comparisons of each treatment against 
a control treatment only whereas Hsu's procedure (Ref. 17.3) selects the “best” treatment 
and identifies those treatments that are worse than the "best." 


Analysis of Means 


Example 


One use of the Bonferroni simultaneous testing procedure is in the analysis of meang 
(ANOM), introduced by Ott (Ref. 17.4). ANOM is an alternative to the standard F test for 
the equality of treatment means, It is conducted by testing Ho: ту = О versus Ha: ту Æ 0, 
Но: тә = 0 versus H4: т» Æ О, and so on for all treatment effects т;. The statistics employed 
are the r estimated treatment effects defined in (16.75b): 


f; —Y,-p. Udo r (17.48) 


where ft. is the least squares mean given in (16.75a): 


1219s 


D 


p (17.482) 


The estimated variance of ?; is obtained by (17.22) since ?; is a contrast of the estimated 
treatment means Y ;.: 


s(t} 


MSE (r -1N^ MSE | 
.Tz )* ir (17.49) 


um F 
Simultaneous testing by the Bonferroni procedure can be carried out by setting up for each 
treatment effect the confidence interval using (17.46) and noting whether or not the interval 
contains zero. The results are sometimes summarized in an avalysis of means plot. Itis easy 
to show ага contrast ®; = Y ;— ji. is inside (outside) one of the Bonferroni contrast intervals 
whenever the cell mean Y ;. is inside (outside) the limits f. Et — а/г; пу — r)s{t}. 
In an analysis of means plot. the cell means are plotted along with the indicated limits 
and the least squares mean fi. in (17.48a). If any of the cell means fall above (below) 
these limits, the conclusion is drawn that the cell mean is larger (smaller) than the overall 
mean. 

ANOM is similar to ANOVA for detecting the differences between cell means? However, 
an important difference between ANOVA and ANOM is that the former tests whether the 
cell means are different from each other. whereas the latter tests whether the cell means are 
different from the overall mean. Various enhancements for the analysis of means have been 
provided, including those in References 17.5 and 17.6. 


In Figure 17.5 we present a MINITAB ANOM plot for the Kenton Food Company example 
using œ = .05. We conclude that the mean of sales for design 4 is greater than the overall 
unweighted mean (16.63). while the mean of sales for both design 1 and design 2 are less 
than the overall unweighted mean. Note that MINITAB bases its ANOM procedure on the 
weighted mean ji. = Y.. rather than the least squares mean in (17.48a). 


Chapter 17 Analysis of Factor Level Means 759 


FIGURE 17.5 


“Means 25 
P Kenon 
жод Company 


чїхатрїе. 


22.1562 


18.6316 


15 15.1070 


1 2 3 4 
Levels of Design 


17.8 Planning of Sample Sizes with Estimation Approach 


In Section 16.10 we considered the planning of sample sizes using the power approach. We 
now take up another approach, the estimation approach to planning sample sizes, which 
may be used either in conjunction with the control of Type I and Type II errors or by 
itself. The essence of the approach is to specify the major comparisons of interest and to 
determine the expected widths of the confidence intervals for various sample sizes, given 
an advance planning value for the standard deviation c. The approach is iterative, starting 
with an initial judgment of needed sample sizes. This initial judgment may be based on 
the needed sample sizes to control the risks of Type I and Type II errors when these have 
been obtained previously. If the anticipated widths of the confidence intervals based on 
the initial sample sizes are satisfactory, the iteration process is terminated. If one or more 
widths are too great, larger sample sizes need to be tried next. If the widths are narrower 
than they need be, smaller sample sizes should be tried next. This process is continued until 
those sample sizes are found that yield satisfactory anticipated widths for the important 
confidence intervals. We proceed to illustrate the estimation approach to planning sample 
sizes with two examples. 


Example 1—Equal Sample Sizes 


We are to plan sample sizes for the snow tires example discussed in Section 16.10 by means 
of the estimation approach; the sample sizes for each tire brand are to be equal, that is, 
п; = n. Management wishes three types of estimates: 


1. A comparison of the mean tread lives for each pair of brands: 


Hi T Hi 


760 Part Four 


Design and Analysis of Single-Factor Simdies 


2. A comparison of the mean tread lives for the two high-priced brands (1 and 4) and 
the two low-priced brands (2 and 3): 


Hi H4 из физ 
2 2 


3. A comparison of the mean tread lives for the national brands (1. 2, and 4) and the 
local brand (3): 


ш that os _ 
3 
Management further has indicated that it wishes a family confidence coefficient of .95 for 
the entire set of comparisons. 

We first need a planning value for the standard deviation of the tread lives of tires, 
Suppose that from past experience we judge the standard deviation to be approximately 
о = 2 (thousand miles). Next, we require an initial judgment of needed sample sizes and 
shall consider я = 10 as a starting point. 

We know from (17.21) that the variance of an estimated contrast Ê when n; = n is: 


ebbe 2 У when n; = п 


Hence, given с = 2 and n = 10, the anticipated values of the standard deviations of the 
required estimators are: 


Antlcipated 
Standard 
Contrast Anticipated Variance Deviation 
al | А 2 2 
jou Tay + C11 =.80 89 
risons 
High- and (2) € ) + € | + ү: + p. =.40 63 
low-priced brands 10 2 2 2 2 ` А 
2 2 2 
National and Q» | (1 1 ] 2 73 
local brands 10 ay hay A3 Te peas ` 


We shall employ the Scheffé multiple comparison procedure and therefore require the 
Scheffé multiple S in (17.43a) for r = 4. пу = 10(4) = 40, and 1 —@ = .95: 


S = (r – Е( оте 1. пу — r) = 3F(95; 3, 36) = 32.87) = 8.61 


Chapter 17 Analysis of Factor Level Means 761 


or $ — 2.93. Hence, the anticipated widths of the confidence intervals are: 


Anticipated Width of 
Contrast Confidence Interval = --So(í) 
Pairwise comparisons +2.93(.89) = + 2.61 (thousand miles) 
High- and low-priced brands +2.93(.63) = + 1.85 (thousand miles) 
National and local brands +2.93(.73) = + 2.14 (thousand miles) 


Management was satisfied with these anticipated widths. However, it was decided to 
increase the sample sizes from 10 to 15 in case the actual standard deviation of the tread 
lives of tires is somewhat greater than the anticipated value c — 2 (thousand miles). 


Example 2—Unequal Sample Sizes 
In the snow tires example, suppose that tire brand 4 is the snow tire presently used and is to 
serve as the basis of comparison for the other brands. The comparisons of interest therefore 
аге мл — L4, ио — иа, and из — pa. The sample size for brand 4 is to be twice as large as for 
the other brands in order to improve the precision of the three pairwise comparisons. The 
desired precision, with a family confidence coefficient of .90, is to be + 1 (thousand miles). 
The Bonferroni procedure will be used to provide assurance as to the family confidence level. 
We know from (17.13) that the variance of an estimated difference Ê; = У. — Yj. (the 
difference is now denoted more generally by Ê) is fori = 1, 2, 3: 


c?(£;) = о? (2 + x) 


nj Пд 


We shall denote the sample sizes for brands 1, 2, and 3 by n and for brand 4 by 2n. Hence, 


the variance of Ê; becomes: 
5 1 1 3с? 
2 2 
o^(L)-o?[-4 —|-2—-— 
el G >) 2п 


Using again the planning value с = 2 and an initial sample size п = 10, we find 
c?(L;) = .60 and o{L;} =.77. For о = .10 and g = 3 comparisons, the Bonferroni multi- 
pleis B = t(.9833; 46) —2.19. Note that nz = 3(10) +20 = 50 for the first iteration; hence 
пт — r=50 — 4—46. The anticipated width of the confidence intervals therefore is 
2.19(.77) = +1.69. This is larger than the specified width +1.0, so a larger sample size 
needs to be tried next. 

We shall try n = 30 next. We find that c (£;) = .45 now, and the Bonferroni multiple will 
be В = t(.9833; 146) = 2.15. Hence, the anticipated width of the confidence intervals for 
п = 3015 2,15(.45) = +.97. This is slightly smaller than the specified width +1.0, However, 
since the planning value for o may not be entirely accurate, management may decide to use 
30 tires for each of the new brands and 60 tires for brand 4, the presently used snow tires. 


Comment 


Since one cannot be certain that the planning value for the standard deviation is correct, it is advisable 
to study a range of values for the standard deviation before making a final decision on sample size. li 


762 PartFour Design and Analysis of Single-Factor Studies 


17.9 Analysis of Factor Effects when Factor Is Quantitative 


Example 


TABLE 17.4 
Data— 
Piecework 
Trainees 
Example. 


When the factor under investigation is quantitative, the analysis of factor effects can be 
carried beyond the point of multiple comparisons to include a study of the nature of the 
response function. Consider an experimental study undertaken to investigate the effect on 
sales of the price of a product. Five different price levels are investi gated (78 cents, 79 cents, 
85 cents, 88 cents, and 89 cents), and the experimental unit is a store. After a preliminary 
test of whether mean sales differ for the five price levels studied, the analyst might use 
multiple comparisons to examine whether “odd pricing” at 79 cents actually leads to higher 
sales than “even pricing” at 78 cents, as well as other questions of interest. In addition, the 
analyst may wish to study whether mean sales are a specified function of price, in the ran 
of prices studied in the experiment. Further, once the relation has been established, the 
analyst may wish to use it for estimating sales volumes at various price levels not studied. 

The methods of regression analysis discussed earlier are, of course, appropriate for the 
analysis of the response function. Since the single-factor studies discussed in this chapter 
almost always involve replications at the different factor levels, the lack of fit of a specified 
response function can be tested. For this purpose, the analysis of variance error sum of 
squares in (16.29) serves as the pure error sum of squares in (3.16), the two being identical, 
We illustrate this relation in the following example. 


In a study to reduce raw material costs in a glassworks firm, an operations analyst collected 
the experimental data in Table 17.4 on the number of acceptable units produced from equal 
amounts of raw material by 28 entry-level piecework ernployees who had received special 
training as part of the experiment. Four training levels were used (6, 8, 10, and 12 hours), 
with seven of the employees being assigned at random to each level. The higher the number 
of acceptable pieces, the more efficient is the employee in utilizing the raw material. This 
study is a single-factor completely randomized design with four factor levels. 


Preliminary Analysis. The analyst first tested whether or not the mean number of accept- 
able pieces is the same for the four training levels. ANOVA model (17.1) was employed: 
Y; = Hi + &j (17.50) 


The alternative conclusions and appropriate test statistic are; 


Ho: Hı = H2 = Из = Ша 
Ha: not all p; are equal 
_ MSTR 


MSE 


* 


Treatment 


(hours of training) Employee () 


i 1 2 3 4 S 6 7 
1 6 hours 40 39 39 36 42 43 41 
2 8 hours 53 48 49 50 5] 50 48 
3 10 hours 53 58 56 59 53 59 58 
4 12 hours 63 62 59 61 62 62 61 


FIGURE 17.6 
aps" 
Computer 


Piecework 
Example. 


Chapter 17 Analysis of Factor Level Means 763 


The SPSS* output for single-factor ANOVA is shown in Figure 17.6. Residual analysis 
(to be discussed in Chapter 18) showed ANOVA model (17.50) to be apt. Therefore, the 
analyst proceeded with the test, using œ = .05. The decision rule is: 


If F* < F(.95; 3, 24) = 3.01, conclude Ho 
If F* > 3.01, conclude H, 


п Y; 
Р | | STANDARD 
: GROUP COUNT MEAN DEVIATION 
САРОТ 7 40.0000 2.3094 
Treatment = ЁЁ? 7 49.8571 1.7728 
reaumen GRP03 7 56.5714 2.6367 
GRPO4 7 61.4286 1.2724 
TOTAL 28 51.9643 8.4129 
ANALYSIS OF VARIANCE 
SOURCE DF SUM OF SQUARES MEAN SQUARES 
BETWEEN GROUPS 3 SSTR —> 1808.6778 602.8926 «— MSTR 
WITHIN GROUPS 24 SSE —- 102.2856 4.2619 «—— MSE 
TOTAL 27 $$TO —> 1910.9634 
F RATIO F PROB. 
141.461 0.0000 
F* P-value 
MULTIPLE RANGE TEST 
TUKEY-HSD PROCEDURE 
RANGES FOR THE 0.050 LEVEL - 
3.90 «— q(.95; 4, 24) 
HOMOGENEOUS SUBSETS 
SUBSET 1 SUBSET 3 
GROUP САРОТ GROUP GRPO3 
MEAN _____/ 40.0000 MEAN i. 56.5714 
SUBSET 2 SUBSET 4 С 
GROUP GRPO2 GROUP GRP04 | 
МЕАМ 49.8571 МЕАМ 61.4286 


764...Part Four Design and Analysis of Single-Factor Studies 


FIGURE 17.7 
Scatter Plot 
and Fitted 
Quadratic 
Response 
Function— 
Piecework 
Trainees 
Example. 


From Figure 17.6, we have: 

MSTR 602.8926 
T "MSE 42619 
Since F* = 141.5 > 3.01, the analyst concluded H,, that training level effects differed and 


that further analysis of them is warranted, The P-value for the test statistic is 0+, as shown 
in Figure 17.6. 


ж 


= 141.5 


Investigation of Treatment Effects. The analyst's interest next centered on multiple 
comparisons of all pairs of treatment means. A Tukey multiple comparison option in the 
SPSS* computer package was used. It gave the output shown in the lower portion of 
Figure 17.6. This output presents the results of single-degree-of-freedom tests conducted 
by means of the Tukey multiple comparison procedure for all pairwise comparisons. (The 
confidence intervals for the pairwise comparisons are not shown in the output.) AII factor 
levels for which the test concludes that the pairwise means are equal are placed in the same 
group. This form of summary of single-degree-of-freedom tests was illustrated earlier for 
the Kenton Food Company example. When a group contains only one factor level, as is the 
case for all groups in the output of Figure 17.6, the implication is that all single-degree-of- 
freedom tests involving this factor level and each of the other factor levels lead to conclusion 
H,, that the two factor level means being compared are not equal. 

Two points should be noted in particular from the results in Figure 17.6: (1) All pair- 
wise factor level differences are statistically significant. (2) There is some indication that 
differences between the means for adjoining factor levels diminish as the number of hours 
of training increases; that is, diminishing returns appear to set in as the length of trainingis 
increased. 


Estimation of Response Function. These findings were in accord with the analyst’s ex- 
pectations that the treatment means u; would most likely follow a quadratic response func- 
tion with respect to training level. The scatter plot in Figure 17.7 supports this expectation. 
The analyst now wished to investigate this point further by fitting a quadratic regression 
model. The model to be fitted and tested is: 


Yi; = Po + Pixi + Bux? + ғ; (17.51) 


an 
wn 


wn 
wn 


w 
щл 


Y = —3.73571 + 9.17500Х — 0.31250X2 


Number of Acceptable Units 
A 
a 


aS eder „эз шу ы 
6 8 10 12 


Hours of Training 


ТАВГЕ 17.5 
{llustration of 
Data for 
Regression 
Analysis— 
Piecework 
Trainees 
Example. 


TABLE 17.6 
Analyses of 
Variance— 
Piecework 
Trainees 
Example. 


. 


Chapter 17 Analysis of Factor Level Means 765 


where Y;; and &;; are defined as earlier, the Bs are regression parameters, and x; denotes 
the number of hours of training in the ith training level (X;) centered around X = 9, i.e., 
х; = Xi = 9. 

A portion of the data for the regression analysis is given in Table 17.5. Regressing Y on 


x and x? yielded the estimated regression function: 
Y — 53.52679 4- 3.55000x — .31250x? (17.52) 


The analysis of variance for regression model (17.51) is shown in Table 17.6a. For com- 
pleteness, we repeat in Table 17.6b the analysis of variance for ANOVA model (17.50). 


i ] Y; UG x? 

1 1 40 6-9=-3 9 

1 2 39 6-9=-3 9 

2 1 53 8-9 = —1 1 

2 2 48 8—9 = 1 1 

4 6 62 12—9= 3 9 

4 7 61 12—9= 3 9 

(a) Regression Model (17:51) . 

Source of К 
Variation $5 df MS 
Regression 1,808.100 2 904.05 
Error ~ 102.864 25 -4.11 
Total 1,910.964 27 


. (b) Analysis of Variance Model (17.50) 


Source of = 
Variation 55. ағ М5. 
Treatments 1,808,678. 3 602:89 
Error 102.286 24 74:26 
Total 1,910.964 27 
(©) ANOVA for Lack of Fit Test 
Source of- | 
Variation -SS .df MS 
Regression 1,808:100 2 904.05 
Error 102:864 25 4.11. 
Lack of fit .578 1 58: 
Pure error 102.286 24 4:26 
Total 1,910.964 27 


766 Part Four 


Design and Analysis of Single-lactor Studies 


Since the data contain replicates, the analyst could test regression model (17.5 D for lack 
of fit, utilizing the fact that the ANOVA error sum of squares in (16.29) is identical to the 
regression pure crror sum of squares in (3.16). Both measure variation around the mean of 
the Y observations at any given level of X (i.e.. around the estimated treatment mean Y 
Hence, the lack of fit sum of squarcs can be readily obtained from previous results: ja 

SSLF — SE. — SSPE = 102.864 — 102.286 = .578 (17.53) 
ublel7.62) — Crablel17.6b) 

Since there are c = r = 4 levels of X here and p = 3 parameters in the regression 
model, SSLF has associated with it e — p = 4—3 = 1 degree of freedom. Hence, we obtain 
MSLF = .578/1 = .578. Table 17.6c contains the analysis of variance for the regression 
model, with the error sum of squares and degrees of freedom broken down into lack of fit 
and pure error components. 

The alternative conclusions (6.68a) for the test of lack of fit here are: 


Ho: E{Y} = Bo + Bix + Вих? 
Hy: E(Y) Æ fo t+ Bix + Вих? 


and test statistic (6.68b) is: 


Е MSLF 
~ MSPE 
For œ = .05. decision rule (6.68c) becomes: 
If F* < F(.95; 1, 24) = 4.26, conclude Ho 


If F* > 4.26. conclude H, 


We calculate the test statistic from Table 17.6c: 


Е* = es = .136 
4.26 
Since F* = .136 < 4.26. the analyst concluded that the quadratic response function is a good 
fit. Consequently. the fitted regression function in (17.52) was used in further evaluation 
of the relation between mean number of acceptable pieces produced and level of training, 
after expressing the fitted response function in the original predictor variable X (number of 
hours of training): 


Ӯ = —3.73571 + 9.17500Х — .31250X" 


Figure 17.7 displays this fitted response function. 


Cited 
References 


17.1. Cochran. W. G.. and G. М. Cox. Experimental Designs. 2nd cd. New York: John Wiley & Sons. 
1957. p. 74. 
7.2. Dunnett. C. W. “A Multiple Comparison Procedure for Comparing Several Treatments with à 
Control.” Journal of the Americau Statistical Association 50 (1955). pp. 1096-1121. 
17.3. Hsu. J. C. Multiple Comparisons: Theary and Methods. London: Chapman & Hall. 1996. 
17.4. On. Е. В. "Analysis of Means—A Graphical Procedure” industrial Quality Control 24 (1967), 
pp. 101—109. 


E 


Chapter 17 Analysis of Factor Level Means 767 


17.5. Nelson; L. S. “Exact Critical Values for Use with the Analysis of Means,” Journal of Quality 
Technology 15 (1983), pp. 40-44. 

17.6. Nelson, P. R. "Additional Uses for the Analysis of Means and Extended Tables of Critical 
Values,” Technometrics 35 (1993), pp. 61-71. 


oblems 


« 


17.1. 


17.2, 


17.3. 


Refer to Premium distribution Problem 16.12. A student, asked to give а class demonstration 
of the use of a confidence interval for comparing two treatment means, proposed to construct a 
99 percent confidence interval for the pairwise comparison D = us — из. The student selected 
this particular comparison because the estimated treatment means E and Уз. are the largest 
and smallest, respectively, and stated: “This confidence interval is particularly useful. If it 
does not straddle zero, it indicates, with significance level о = .01, that the factor level means 
are not equal." 
a. Explain why the student's assertion is not correct. 
b. How should the confidence interval be constructed so that the assertion can be made with 
significance level a = .01? 
A trainee examined a set of experimental data to find comparisons that “look promising” 
and calculated a family of Bonferroni confidence intervals for these comparisons with a 
90 percent family confidence coefficient. Upon being informed that the Bonferroni procedure 
is not applicable in this case because the comparisons had been suggested by the data, the 
trainee stated: "This makes no difference. I would use the same formulas for the point estimates 
and the estimated standard errors even if the comparisons were not suggested by the data.” 
Respond. 
Consider the following linear combinations of interest in a single-factor study involving four 
factor levels: 


@ ш +3- 4из 
Gi 3u +.-5u2 + lus dd 


äi) Hi ыз _ 


a. Which of the linear combinations are contrasts? State the coefficients for each of the 
contrasts. 

b. Give an unbiased estimator for each of the linear combinations. Also give the estimated 
variance of each estimator assuming that n; = n. 


17.4. Asingle-factor ANOVA study consists of r = 6 treatments with sample sizes n; = 10. 


17.5. 


a. Assuming that pairwise comparisons ofthe treatment means are to be made with a 90 percent 
family confidence coefficient, find the T, S, and B multiples for the following numbers of 
pairwise comparisons in the family: g — 2, 5, 15. What generalization is suggested by your 
results? 

b. Assuming that contrasts of the treatment means are to be estimated with a 90 percent family 
confidence coefficient, find the S and B multiples for the following numbers of contrasts in 
the family: g = 2, 5, 15. What generalization is suggested by your results? 


Consider a single-factor study with r = 5 treatments and sample sizes n; = 5. 


a. Find the Т, S, and B multiples if g = 2, 5, and 10 pairwise comparisons are to be made 
with a 95 percent family confidence coefficient. What generalization is suggested by your 
results? 


768 PartFour Design aud Analysis of Siugle-Factor Studies 3 


b. 


& 


What would be the 7, 5, and B multiples for sample sizes n; = 20? Does the р" 
obtained in part (а) still hold? rali а 


17.6. In making multiple comparisons, why is it appropriate to use the multiple comparison s ` 
: К 2 : onp 
dure that leads io the tightest confidence intervals for the sample data obtained? Dio, ie 
? si 


V7.7. Fora single-factor study with r = 2 treauments and sample sizes n; = 10, find the T, $ 
B multiples for g = | pairwise comparison with a 99 percent family confidence coeff = 
d 


x17.8. Refer to Productivity improvement Problem 16.7. £ 


What generalization is suggested by your results? 


а. 


Prepare a line plot of the estimated factor level means Y ;.. What does this plot su 
ing the effect of the level of research and development expenditures on mean 
improvement? 


EBestreg Я 
producti 


. Estimate the mean productivity improvement for firms with high research and developnié 


expenditures Icvels: use a 95 percent confidence interval. 


. Obtain a 95 percent confidence interval for D = рэ — 4,. Interpret your interval estimat 


d. Obtain confidence intervals for all puirwise comparisons of the treatment means; use th 


Tukey procedure and a 90 percent family confidence coefficient. State your findings aj; 
prepare a graphic summary by underlining nonsignificant comparisons in your line plotin 
part (a). Er 
Is the Tukey procedure employed in part (d) the most efficient one that could be used herë 
Explain. * 


17.9. Refer to Questionnaire color Problem 16.8. 


17.10. Refer to Rehabilitation therapy Problem 16.9. 


x17.11. 


a. 


с. Test whether or not D = цз — Ho = 0: use œ = .10. State the alternatives, decision rule, ай 


а. 


d. 


а. 


b. 


. Estimate the mean response rate for blue questionnaires; use a 90 percent confidence interval? 


. Test for all pairs of factor level means whether or not they differ: use the Tukey pt 


Prepare а bar-interval graph of the estimated factor level means Y ;.. Where the interval" 
correspond to the confidence limits in (17.7) with о = .05. What does this plot sugges 
about the effect of color on the response rate? Is your conclusion in accord with the еў 
result in Problem 16.8c? 


conclusion. In light of the result for the ANOVA test in Problem 16.8e. is your conclusi 
surprising? Explain. 


Prepare a line plot of the estimated factor level means Y;.. What does this plot suggest abo i. 
the effect of prior physical fitness on the mean time required in therapy? 

Estimate with a 99 percent confidence interval the mean number of days required if therapy 
for persons of average physical fitness. 


. Obtain confidence intervals for D, = и» — из and D; = иу — px: use the Bonferroni 


procedure with a 95 percent family conlidence coefficient. Interpret your results. 
Would the Tukey procedure have been more efficient to use in part (c)? Explain. 
If the researcher also wished to estimate D; = ру — из. still with а 95 percent family 
conlidence coefficient, would the B multiple in part (c) nced to be modified? Would this 


also be the case if the Tukey procedure had been employed? 
rocedure 


with œ = .05. Set up groups of factor levels whose means do not differ. 


Refer to Cash offers Problem 16.10. 


Prepare a main effects plot of the estimated factor level means Y;.. What does this plot 


suggest regarding the effect of the owner's age on the mean cash offer? 


Estimate the mean cash oller for young owners: usc a 99 percent confidence interval. 


КУГ 


Chapter 17 Analysis of Factor Level Means 769 


c. Construct a 99 percent confidence interval for D = из — Hı. Interpret your interval estimate. 


d. Test whether or пої [45 — Hı = Из — Шо; control the о risk at .01. State the alternatives, 

decision rule, and conclusion. 

Ы e. Obtain confidence intervals for all pairwise comparisons between the treatment means; use 
the Tukey procedure and a 90 percent family confidence coefficient. Interpret your results 
and provide a graphic summary by preparing a paired comparison plot. Are your conclusions 
in accord with those in part (a)? 


s 


£ Would the Bonferroni procedure have been more efficient to use in part (e) than the Tukey 
procedure? Explain. 


17.12. Refer to Filling machines Problem 16.11. 

a. Prepare a main effects plot of the estimated factor level means Y;.. What does this plot 
suggest regarding the variation in the mean fills for the six machines? 
Construct a 95 percent confidence interval for the mean fill for machine 1. 
Obtain a 95 percent confidence interval for D = шә — д. Interpret your interval estimate. 
Prepare a:paired comparison plot and interpret it. 


о Rn т 


. The consultant is particularly interested in comparing the mean fills for machines 1, 4, 
and 5. Use the Bonferroni testing procedure for all pairwise comparisons among these 
three treatment means with family level of significance @ = .10. Interpret your results and 
provide a graphic summary by preparing a line plot of the estimated factor level means with 

2 nonsignificant differences underlined. Do your conclusions agree with those in part (2)? 

f. Would the Tukey testing procedure have been more efficient to use in part (e) than the 

Bonferroni testing procedure? Explain. 


17.13. Refer to Premium distribution Problem 16.12. 


a. Prepare an interval plot of the estimated factor level means Yi., where the intervals corre- 
spond to the confidence limits in (17.7) with a = .10. What does this plot suggest about the 
variation in the mean time lapses for the five agents? 

b. Testforallpairs of factor level means whether or not they differ; use the Tukey procedure with 


а = .10. Set up groups of factor levels whose means do not differ. Use a paired comparison 
plot to summarize the results. 


en s 


c. Construct a 90 percent confidence interval for the mean time lapse for agent 1. 

d. Obtain а 90 percent confidence interval for D = шә — pı. Interpret your interval estimate. 

e. The marketing director wishes to compare the mean time lapses for agents 1, 3, and 5. Obtain 
confidence intervals for all pairwise comparisons among these three treatment means; use 
the Bonferroni procedure with а 90 percent family confidence coefficient. Interpret your 
results and present a graphic summary by preparing a line plot of the estimated factor level 
means with nonsignificant differences underlined. Do your conclusions agree with those in 
part (2)? 

f. Would the Tukey procedure have been more efficient to use in part (e) than the Bonferroni 
procedure? Explain. 

x17.14. Refer to Productivity improvement Problem 16.7. 


a. Estimate the difference in mean productivity improvement between firms with low or moder- 
ate research and development expenditures and firms with high expenditures; use a 95 percent 
confidence interval. Employ an unweighted mean for the low and moderate етещ 
groups. Interpret your interval estimate. 

b. The sample sizes for the three factor levels are proportional to the population sizes. The 
economist wishes to estimate the mean productivity gain last year for all firms in the 


770 PartFour Design aud Analysis of Single-Factor Studies 


population. Estimate this overall mean productivity improvement with a 95 percent confi 
Е ntt. 
dence interval. 
c. Using the Scheffé procedure. obtain conlidence intervals for the following comp 


ad d 1 atisons with 
90 percent family confidence coefficient: 


Di = a — и Di = н = 14 
4 d uo 
D» = pns — pi г = са Us 
9 
Interpret your results and describe your lindings. 
17.15. Reter to Rehabilitatiou therapy Problem 16.9. 
a. Estimate the contrast L = (4i = 402) — (42 — 143) with а 99 percent confidence interval 
Interpret your interval estimate. 
b. Estimate the following comparisons using the Bonferroni procedure with a 95 percent family 
conlidencc coefficient: 
Di = ui =н Ds = pu» — Их 
Dr = pn > pna L| = Di- Di 


Interpret your results and describe your findings. 
c. Would the Schetfé procedure have been more efficient to use in part (b) than the Bonferroni 
procedure? Explain. 
*17.16. Refer to Cash offers Problem 16.10. 
a. Estimate the contrast L = (443 — H2) — (H2 — ti) with a 99 percent confidence interval. 
Interpret your interval estimate. 


b. Estimate the following comparisons with a 90 percent family confidence coefficient; employ 
the most efficient multiple comparison procedure: 


Dy — quo = ti Di = us — Hi 
D = [13 — 2 Li = Э›— Dı 


Interpret your results. 
ж17.17. Refer to Filling machines Problem 16.11. Machines | and 2 were purchased new five years 
ago. machines 3 and 4 were purchased in a reconditioned state tive years ago. and machines 
5 and 6 were purchased new last year. 
a. Estimate the contrast: 
tty + и» Hs + qu 
B 2 2 


1, 


with a 95 percent contidence interval. Interpret your intcrval estimate. 
b. Estimate the following comparisons with a 90 percent family contidence coefficient; use the 
most efficient multiple comparison procedure: 


аи Hsctpa 


Di = qu = uo Li 


2 2 
ө м +и Hs H6 
D» = ns — pu 1 = 3 ; 
ty + Hid Us + He ty + ye 
D: = [ts — [lo Few Ц T Hc I LI 


Hid qo Had Ha s + fto 


з 4 2 


Chapter 17 Analysis of Factor Level Means 771 


Interpret your results. What can the consultant learn from these results about the differences 
between the six filling machines? 


17.18. Refer to Premium distribution Problem 16.12. Agents 1 and 2 distribute merchandise only, 
agents 3 and 4 distribute cash-value coupons only, and agent 5 distributes both merchandise 
and coupons. 


a. Estimate the contrast: 


Haa Had Ha 


L 
2 2 


with a 90 percent confidence interval. Interpret your interval estimate. 
b. Estimate the following comparisons with 90 percent family confidence coefficient; use the 


Scheffé procedure: 
ш d ш 
Dı = ш — ш L, — 771 — lis 
Bad ш 
D; = из — pa L= 757775 — us 
L Mtm Meth 
: 2 2 
Interpret your results. 


c. Of all premium distributions, 25 percent are handled by agent 1, 20 percent by agent 2, 20 
percent by agent 3, 20 percent by agent 4, and 15 percent by agent 5. Estimate the overall 
mean time lapse for premium distributions with a 90 percent confidence interval. 

x17.19. Refer to Filling machines Problem 16.11. 

a. Use the analysis of means procedure to test for equality of treatment effects, with family 
significance level .05. Which treatments have the strongest effects? 

b. Using the results in part (а), obtain the analysis of means plot. What additional information 
does this plot provide in comparison with the main effects plot in Problem 17.122? 

17.20. Refer to Premium distribution Problem 16.12. 

a. Use the analysis of means procedure to test for equality of treatment effects, with family 
significance level .10. Which treatments have the strongest effects? 

b. Using the results in part (а), obtain the analysis of means plot. What additional information 
does this plot provide in comparison with the interval plot in Problem 17.132? 

17.21. Refer to Solution concentration Problem 3.15. Suppose the chemist initially wishes to employ 
ANOVA model (16.2) to determine whether or not the concentration of the solution is affected 
by the amount of time that has elapsed since preparation. 

a. State the analysis of variance model. 

b. Prepàre a main effects plot of the estimated factor level means Y;.. What does this plot 
suggest about the relation between the solution concentration and time? 

c. Obtain the analysis of variance table. 

d. Test whether or not the factor level means are equal; use œ = .025. State the alternatives, 
decision rule, and conclusion. 

e. Make pairwise comparisons of factor level means between all adjacent lengths of time; 
use the Bonferroni procedure with a 95 percent family confidence coefficient. Are your 
conclusions in accord with those in part (b)? Do your results suggest that the regression 
relation is not linear ? 


772 PartFour Design and Analysis of Single-l'actor Studies 


17.24. 


x17.25. 


17.26. 


. A market researcher stated in a seminar: “The power approach to determining sample siz, 
= es 


for analysis of variance problems is nor meaningful: only the estimation approach Should be 
used. We never conduct a study where all treatment means are expected to be equal, so We are 
always interested in a variety of estimates.” Discuss. 

Refer to Questionnaire color Problem 16.8. Suppose estimates of all pairwise Comparisons 
are of primary importance. What would be the required sample sizes if the precision of all 
pairwise comparisons is to bc +3.0. using the Tukey procedure with a 95 percent family 
confidence coellicient? 

Refer to Rehabilitation therapy Problem 16.9. Suppose primary interest is in estimating the 
two pairwise comparisons: 


Li = ш — Н> Lr = H3 — pna 


What would be the required sample sizes il the precision of each comparison is to be £3.0 days, 
using the most efficient multiple comparison procedure with a 95 percent family confidence 
coetticient? 

Refer to Filling machines Problem 16.11. Suppose primary interest is in estimating the 
following comparisons: 


Hit Pat 
2 2 


Li — qu - pn L; = 


Hit Ha Haa ps + Hp 
4 2 
What would be the required sample sizes if the precision of each of these comparisons is not 


to exceed +.08 ounce. using the best multiple comparison procedure with a 95 percent family 
confidence coeflicient? 


15 = 3 — {4 Ly= 


Refer to Premium distribution Problem 16.12. Suppose primary interest is in estimating the 
following comparisons: 


-Hte 
= 2 


Li = gu — pna Ls Hs 
atike gua tu 
2 э 


Ly = Из — {4 Ly 


What would be the required sample sizes if the precision of each of the estimated comparisons 
is not to exceed 1.0 day. using the most efficient multiple comparison procedure with à 
90 percent family confidence coeflicient? 


. Refer to Rehabilitation therapy Problem 16.9. Suppose that primary interest is im comparing 


the below-average and above-average physical fitness groups. respectively. with the average 
physical fitness group. Thus. two comparisons аге of interest: 


Li =U H 15 = H3 — n: 


Assume that a reasonable planning value for the error standard deviation is о = 4.5 days. 


a. It has been decided to use equal sample sizes (т) for the below-average and above-average 
groups. If twice this sample size (2n) were to be used for the average physical htness group, 
what would be the required sample sizes if the precision of each pairwise comparison is to be 
+2.5 days. using the Bonferroni procedure and a 90 percent family confidence coefficient? 

b. Repeat the calculations in part (а) if the sample size for the average physical fitness group 
is to be: (1) папа (2) Зи. all other specifications remaining the same. 

c. Compare your results in parts (a) and (b). Which design leads to the smallest total sample 
size here? 


17.28. 


x17.29. 


Chapter 17 Analysis of Factor Level Means 773 


Refer to Rehabilitation therapy Problem 16.9. A biometrician has developed a scale for 
physical fitness status, as follows: 


Physical Fitness Scale 
Status Value 
Below average 83 
Average 100 
Above average 121 


a. Using this physical fitness status scale, fit first-order regression model (1.1) for regressing 
number of days required for therapy (Y) on physical fitness status (X). 

b. Obtain the residuals and plot them against X. Does a linear regression model appear to fit 
the data? \ 

c. Perform an F test to determine whether or not there is lack of fit of a linear regression 
function; use a = .05. State the alternatives, decision rule, and conclusion. 

d. Could you test for lack of fit of a quadratic regression function here? Explain. 


Refer to Filling machines Problem 16.11. A maintenance engineer has suggested that the 
differences in mean fills for the six machines are largely related to the length of time since a 
machine last received major servicing. Service records indicate these lengths of time to be as 
follows (in months): 


Filling Number of Filling Number of 
Machine Months Machine Months 
1 E! 4 5.3 
2 3.7 5 1.4 
3 6.1 6 2.1 


a. Fit second-order polynomial regression model (8.2) for regressing amount of fill (Y) on 
number of months since major servicing (X). 

b. Obtain the residuals and plot them against X. Does a quadratic regression function appear 
to fit the data? 

c. Perform an F test to determine whether or not there is lack of fit of a quadratic regression 
function; use œ = .01. State the alternatives, decision rule, and conclusion. 

d. Test whether or not the quadratic term in the response function can be dropped from the 
model; use œ = .01. State the alternatives, decision rule, and conclusion. 


Exercises 


17.30. 


17.31. 
17.32. 
17.33. 
17.34. 


Show that when r = 2 and n; = n, q defined in (17.35) is equivalent to 4/2[t*|, where г* is 
defined in (A.65) in Appendix A. 

Starting with (17.38), complete the derivation of (17.30). 

Show that when r = 2, S? defined in (17.43a) is equivalent to [t(1 — 0/2; пу — )]}?. 

Show that the estimated variance of $; in (17.48) is given by (17.49). 

(Calculus needed.) Refer to Rehabilitation therapy Problem 16.9. The sample sizes for the 
below-average, average, and above-average physical fitness groups are to be п, kn, and n, 
respectively. Assuming that ANOVA model (16.2) is appropriate, find the optimal value of 
k to minimize the variances of Ê, = Yi. — b. and £4 = Үз. — 24 for a given total sample 
size пт. 


774 Part Four 


Desigu and Analysis of Siugle-Facior Studies 


Projects 


17.35. 


17.36. 


17.37. 


17.38. 


Refer to the SENIC data set in Appendix С.Г and Project 16.42. Obtain confidence interval for 
all pairwise comparisons between the four regions; use the Tukey procedure and a 90 percent 
family confidence coefficient. Interpret your results and state your findings. Prepare а fine plot 
of the estimated factor level means and underline all nonsignificant comparisons, 

Reter to the CDI data set in Appendix C.2 and Project 16.44. Obtain confidence intervals for 
all pairwise comparisons betwcen the four regions; use the Tukey procedure and a 90 percent 
family confidence coefficient. Interpret your results and state your findings. Prepare a line plot 
of the estimated factor level means and underline all nonsignificant comparisons. 

Refer to the Market share data set in Appendix C.3 and Project 16.45. Obtain confidence 
intervals for all pairwise comparisons among the four factor levels; use the Tukey procedure 
and a 95 percent family confidence coefficient. Interpret your results and state your find. 
ings. Prepare a line plot of the estimated factor level means. underscoring all nonsignificant 
comparisons. 

Refer to Project 16.46e. 


a. For each replication, construct confidence intervals for all pairwise comparisons among 
the three weatment means: use the Tukey procedure with a 95 percent family confidence 
coefficient. Then determine whether all confidence intervals for the replication are correct, 
given that jj; = 80, и» = 60, and из = 160. 

b. For what proportion of the 100 replications are all confidence intervals correct? Is this 
proportion close to theoretical expectations? Discuss. 


Case 
Studies 


17.39. 


17.40. 


17.41. 


Refer to the Prostate cancer data set in Appendix C.5 and Case Study 16.49. Obtain confidence 
intervals for all pairwise comparisons among the three Gleason score levels: use the Tukey 
procedure and a 95 percent family confidence coefficient. Interpret your results and state your 
findings. Prepare a line plot of the estimated factor level means. underscoring all nonsignificant 
comparisons. 

Refer to the Real estate sales data set in Appendix C.7 and Case Study 16.50. Obtain confi- 
dence intervals for all pairwise comparisons among the four number-of-bedroom categories; 
use the Tukey procedure and a 90 percent family confidence coefficient. Interpret your results 
and state your findings. Prepare a line plot of the estimated factor level means, underscoring 
all nonsignificant comparisons. 

Refer to the Ischemic heart disease data set in Appendix C.9 and Case Study 16.41. Ob- 
tain confidence intervals for all pairwise comparisons among the six number-of-intervention 
categories; use the Tukey procedure and a 90 percent family confidence coefficient. Interpret 
your results and state your findings. Prepare a line plot of the estimated factor level means, 
underscoring all nonsignificant comparisons. 


ANOVA Diagnostics 


and Remedial Measures 


When discussing regression analysis, we emphasized the importance of examining the 
appropriateness of the regression model under consideration, and noted the effectiveness of 
residual plots and other diagnostics for spotting major departures from the tentative model. 
Examination of the appropriateness of analysis of variance models is no less important. 

In this chapter, we take up the use of residual plots for diagnosing the appropriateness of 
analysis of variance models, as well as formal tests for the constancy of the error variance. 
We also discuss the use of transformations of the response variable as a remedial measure 
to improve the appropriateness of the analysis of variance model for estimation and test 
inferences. 

For pedagogic reasons, as in regression analysis, we have discussed inference procedures 
before diagnostics and remedial measures. The actual sequence of developing and using 
any statistical model is, of course, the reverse: 


1. Examine whether the proposed model is appropriate for the set of data at hand. 

2. If the proposed model is not appropriate, consider remedial measures, such as transfor- 
mation of the data or modification of the model. 

3. After review of the appropriateness of the model and completion of any necessary 
remedial measures and an evaluation of their effectiveness, inferences based on the 
model can be undertaken. 


It is not necessary, nor is it usually possible, that an ANOVA model fit the data perfectly. As 
will be noted later, ANOVA models are reasonably robust against certain types of departures 
from the model, such as the error terms not being exactly normally distributed. The major 
purpose of the examination of the appropriateness of the model is therefore to detect serious 
departures from the conditions assumed by the model. 


18.1 Residual Analysis 


Residual analysis for ANOVA models corresponds closely to that for regression models. 
We therefore discuss only briefly some key issues in the use of residual analysis for ANOVA 
models. 


775 


776 Part Four Design and Analysis of Single-Factor Studies 


Residuals 
The residuals e;; for the ANOVA cell means model (16.2) were defined in (16.20); 
ej = Yy - Yy = Yj - Y. (18.1) 
As in regression, semistudentized residuals, studentized residuals, and studentized deleteq 
residuals are often helpful for diagnosing ANOVA model departures. The definitions of 
these residuals for regression in Chapters 3 and 10 are still applicable for ANOVA models. 


However, in view of the simple nature of the X matrix for ANOVA models, the regression 
formulas often simplify here. The semistudentized residuals ей їп (3.5) for regression remain 


unchanged: 
eee ё} 
U^ MSE (18.2) 
The studentized residuals r;; in (10.20) become here: 
Ku DEC 1 
Hp. ste; ( 8.3) 
where: 
MSE(n; — 1 
s(ej) = E (18.32) 
Finally, the studentized deleted residuals 1; in (10.26) become here: 
12 
—r-—] 
fij = е Sa (18.4) 
SSE (1 — z) — 6; 
hj 
Comment 


For ANOVA model (16.2), it can be shown that the leverage of Y;;, defined in (10.18), is given by: 
1 


мы = — (18.5) 
п 
Hence, the variance ofthe residual e;; for ANOVA model (16.2) can be obtained by substituting (1 85) 
into (10.14): А 
2 . — 
се} = g uu (18.6) 
J n; 


Replacing o? by the unbiased estimator MSE and taking the square root lead to the estimated standard 
deviation s{e;;} in (18.32). 

When the treatment sample sizes n; are the same, the leverages of all the observations Yi; are 
the same. As a result, the estimated standard deviations of the residuals, s{e;;}, are all the same 50 
that the semistudentized residuals e and the studentized residuals r;; provide essentially the same 
information, differing only by a constant factor near I unless the treatment sample size is very small. 


Residual Plots 
Residual plots useful for analysis of variance models include: (1) plots against the fitted 
values, (2) time or other sequence plots, (3) dot plots, and (4) normal probability plots. 
All of these plots have been encountered previously. We therefore proceed directly (0 a7 


Chapter 18 ANOVA Diagnostics and Remedial Measures 777 


example to illustrate the use of residual plots for evaluating the appropriateness of analysis 
of variance models. 


Table 18.1 contains a portion of the residuals for the rust inhibitor example of Chapter 17. 
For ease of presentation, the treatments are shown in the columns of the table. The residuals 
were obtained from the data in Table 17.2a. For instance, the residual for the first experi- 
mental unit treated with brand A rust inhibitor is: 


€t, = Yu = Tu = Yu = Y — 43.9 — 43.14 — .76 


3 Figure 18.1 presents three MINITAB diagnostic residual plots. Figure 18.1a contains a 
residual plot against the fitted values. 'This plot differs in appearance from similar plots for 


TABLE 18.1 ч Brand 

esiduals— А 

Rust Inhibitor A B c D 
‚ jample. j i=1 i=2 i=3 i=4 
x 1 J6 7.36 .45 —4.27 
2 —4.14 —2.34 1.35 4.73 
3 3.56 3.26 55 .23 
8 —4.24 —1.34 —2.75 —1.77 
9 46 1.36 ~4.15 .43 
10 —3.14 —.34 1.25 —.77 


FIGURE 18.1 MINITAB Diagnostic Residual Plots—Rust Inhibitor Example. 


(a) Residual against Ў (с) Normal Probability Plot 
5 d. 
4 е 
3 et 
2 . 
X = m ео 
E S 1 m-— 24 
3 = 0 cud 
E e -1 e 
—2 Ed 
-3 be 
—4FrF. eee 
=S 
Qiu lo Gic clo 
-2 -1 0 1 2 
Exp Val 
(b) Aligned Residual Dot Plot 
Brand D . e è è агга . . 
Brand C . . . е өз Lad a . 
. е оа . е oe eo е 
Brand В —r——1———r——1— ot 
өз a $o Ф a a 


778: Part Four Design and Analysis of Single-Factor Studies 


regression analysis because the fitted values Ӯ, here are the same for all observations fora 
given factor level. Recall from (16.17) that Y;; = Y;.. 

Figure 18.1b contains aligned dot plots of the residuals for each factor leve], These 
plots are similar to the residual plot against the fitted values in Figure 18 la, except here 
the residual scale is the horizontal one. An advantage of the plot in Figure 18.1a is tha it 
facilitates an assessment of the relation between the magnitudes of the error variances and the 
factor level means. A disadvantage is that some of the estimated factor level means may be far 
apart, making a comparison of the factor levels more difficult. This difficulty is remedieg in 
Figure 18.16 since dot plots can be placed close together to facilitate comparisons between 
factor levels. 

Figure 18.1с contains a normal probability plot of the residuals. This plot is exactly the 
same as for regresston models. 

No sequence plot of the residuals is presented here because the data for the rust inhibitor 
example were not ordered according to time or in some other logical sequence. 

All of the plots in Figure 18.1, as we shall see, suggest that ANOVA model (16.2) is 
appropriate for the rust inhibitor data. 


Diagnosis of Departures from ANOVA Model 


We consider now how residual plots can be helpful in diagnosing the following departures 
from ANOVA model (16.2): 


Nonconstancy of error variance 
Nonindependence of error terms 

Outlters 

Omission of important explanatory variables 
Nonnormality of error terms 


мл Ооо ор 


Nonconstancy of Error Variance. ANOVA model (16.2) requires that the error terms £j; 
have constant variance for all factor levels. When the sample sizes are not large and do not 
differ greatly, the appropriateness of this assumption can be studied by using the residuals, 
semistudentized residuals. or studentized residuals. Plots of residuals against fitted values or 
dot plots of residuals are helpful. When the sample sizes differ greatly, studentized residuals 
should be used in these plots. Constancy of the error vartance is shown in these plots by the 
plots having about the same extent of scatter of the residuals around zero for each factor 
level. This is the case for the rust inhibitor example in Figures 18.1a and 18.Ib. 

Figure 18.2 is a prototype residual plot against the fitted values when the errór variances 
are not constant. This plot portrays the case where the error terms for factor level 3 havea 
larger variance than those for the other two factor levels. 

When the sample sizes for the different factor levels are large, histograms or boxplots 
of the residuals for each treatment—arranged vertically and using the same scale, like the 
dot plots in Figure 18. Ib—are an effective means for examining the constancy of the error 
variance, as well as for assessing whether the error terms are normally distributed. 

А number of statistical tests have been developed for formally examining the equality 
of the r factor level variances; two of these tests will be discussed in Section 18.2. 


Nonindependence of Error Terms. Whenever data are obtained in a time sequence 
a residual sequence plot should be prepared to examine if the error terms are serially 


GURE 18.2 
gatype Plot 
вена! 
> ¢ Fitted 
ues When 
er Term 
‘ance Is Not 

itant for 


СОБЕ 18.3 

sequence Plots 
aur. Group 
Tütéraction 
Study 
Time-Related 
Effect. 


Chapter 18 ANOVA Diagnostics and Remedial Measures 779 


Residual 


Fitted Value 


Factor Level 1 Factor Level 2 Factor Level 3 


2 4 6 2 4 6 2 4 6 
Time Order Time Order Time Order 


correlated. Figure 18.3 contains the residuals for an experiment on group interactions. Three 
different treatments were applied, and the group interactions were recorded on videotapes. 
Seven replications were made for each treatment. Afterward, the experimenter measured 
the number of interactions by viewing the tapes in randomized order. Figure 18.3 strongly 
suggests that the experimenter discerned a larger number of interactions as more experience 
in viewing the tapes was gained. As a result, the residuals in Figure 18.3 appear to be serially 
correlated. In this instance, an inclusion in the model of a linear term for the time effect 
might be sufficient to assure independence of the error terms in the revised model. 

Time-related effects may also lead to increases or decreases in the error variance over 
time. For instance, an experimenter may make more precise measurements over time. Fig- 
ure 18.4 portrays residual sequence plots where the error variance decreases over time. 

When the data are ordered in some other logical sequence, such as in a geographic 
sequence, a plot of the residuals against this ordering is helpful for ascertaining whether the 
error terms are serially correlated according to this ordering. 


Outliers. The detection of outliers is facilitated by various plots of the studentized dgleted 
residuals. Residual plots against fitted values, residual dot plots, box plots, and stem-and- 


780 PartFour Design and Analysis of Single-Factor Studies 


FIGURE 18.4 
Residual 
Sequence Plots 
Illustrating 
Decreasing 
Error Variance 
over Time. 


FIGURE 18.5 
Residual Plot 
against Fitted 
Values 
Illustrating 
Omission of 
Important 
Explanatory 
Variable. 


Factor Level 1 Factor Level 2 Factor Level 3 
ey ej 
е 
. . 
è е е 
е 
е е 
е ? Е 
е 
е * е 
e © 
е 
2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 
Time Order Time Order Time Order 
Treatment Treatment Treatment 
1 3 2 


[9] 


Residual 


O Male Subject 
€ Female Subject 


leaf plots are particularly helpful. These plots easily reveal outlying observations, that is, 
observations that differ from the fitted value by far more than do other observations. As 
noted in Chapter 3, it is wise practice to discard outlying observations only if they can be 
identified as being due to such specific causes as instrumentation malfunctioning, observer 
measurement blunder, or recording error. 

The test for outliers in regression discussed in Chapter 10 is applicable to analysis of 
variance as well. The appropriate Bonferroni critical value here is t (1 — а /2пу; nz —r — D 
If the largest absolute studentized deleted residual exceeds this critical value, that case 
should be considered an outlier. Note that the implicit family of tests here consists of the 
tests on all пт residuals for the study since we do not know in advance which case will have 
the largest absolute studentized deleted residual. 

Occasionally, a test for an outlier is suggested in advance of the analysis, as when a 
substitute operator is used for one of the production runs in a manufacturing experiment 
Concern about the validity of this response observation might lead to an outlier test. In this 
case, the Bonferroni critical value would be t(1 — @/2; nr — r — 1). 


Fitted Value 


Omission of Important Explanatory Variables. Residual analysis may also be used to 
study whether or not the single-factor ANOVA model is an adequate model. In a learning 
experiment involving three motivational treatments, the residuals shown in Figure 18.5 
were obtained. The residual plot against the fitted values in Figure 18.5 shows no unusual 


T Chapter 18 ANOVA Diagnostics and Remedial Measures 781 
overall pattern. The experimenter wondered, however, whether the treatment effects differ 
according to the gender of the subject. In Figure 18.5 the residuals for male subjects are 
shown by open circles, and those for females by dots. The results in Figure 18.5 suggest 
strongly that for each of the motivational treatments studied, the treatment effects do differ 
according to gender. Here, an analysis of covariance model, recognizing both motivational 
treatment and gender of subject as explanatory variables as mentioned in Chapter 15, might 
be more useful. Analysis of covariance models will be discussed in Chapter 22. 

Note that residual analysis here does not invalidate the original single-factor model. 
Rather, the residual analysis points out that the original model overlooks differences in 
treatment effects that may be important to recognize. Since there are usually many ex- 
planatory variables that have some effect on the response, the analyst needs to identify for 
residual analysis those explanatory variables that most likely have an important effect on 
the response. 


Nonnormality of Error Terms. The normality of the error terms can be studied from 
histograms, dot plots, box plots, and normal probability plots of the residuals. In addition, 
comparisons can be made of observed frequencies with expected frequencies if normality 
holds, and formal chi-square goodness of fit or related tests can be utilized. The discussion in 
Chapter 3 about these methods for assessing the normality of the error terms for regression 
is entirely applicable to ANOVA models. 

When the factor level sample sizes are large, the study of normality can be made sepa- 
rately for each treatment. When the factor level sample sizes are small, one can combine the 
residuals e;; for all treatments into one group, provided that the evidence suggests that there 
are no major differences in the error variances for the treatments studied. This combining 
was done in the rust inhibitor example in Figure 18.1c. This figure does not indicate any 
serious departures from normality. The pattern of the points is reasonably linear except 
possibly in the tails. The coefficient of correlation between the ordered residuals and their 
expected values under normality is .987, which also supports the reasonableness of the 
normality assumption. 

When unequal variances of the error terms for the different factor levels are indicated 
and normality must be examined for the combined data, studentized residuals (18.3) should 
be used, with MSE replaced by the sample variance s? in (16.39) for observations from 
the ith treatment. If ordinary residuals were used, nonnormality might be indicated solely 
because of the failure of the error terms to have equal variances. 


Comment 


As for regression models, the ANOVA residuals e;; are not independent random variables. For ANOVA 
model (16.2), they are subject to the restrictions in (16.21). Consequently, statistical tests that require 
independent observations are not exactly appropriate for ANOVA residuals. If, however, the number 
of residuals for each factor level is not small, the effect of the correlations will only be modest. It has 
been noted that graphic plots of residuals are less subject to the effects of correlation than are statistical 
tests because graphic plots contain the individual residuals and not simply functions of them. a 


18.2 Tests for Constancy of Error Variance 


Several formal tests are available for studying the constancy of the error variance, as required 
by the ANOVA model. We shall consider two of these, the Hartley test (Ref. 18.1) and the 
Brown-Forsythe test (Ref. 18.2). Both tests assume that independent random samples are 


782 PartFour Design and Analysis of Single-Factor Studies 


Hartley Test 


obtained from each population. The Hartley test is simple to carry out, but is applicable 
only if the sample sizes are equal and if the error terms are normally distributed, The test 
is designed to be sensitive to substantial differences between the largest and the smallest 
factor level variances. The Brown-Forsythe test, discussed in Chapter 3, is slightly more 
difficult to compute but is more generally applicable. The test has been shown to be robust 
to departures from normality, and sample sizes need not be equal. 

Both the Hartley test and the Brown-Forsythe test are often conducted at low o levels 
when used for testing the constancy of the error variance in the analysis of variance. The 
reason is that, as we shall note in Section 18.6, the F test for equality of factor level means 
is robust against nonconstancy of the error variance when the factor level sample sizes are 
approximately equal, as long as the differences in the variances are not extremely large, 
Hence, the purpose of using the Hartley or Brown-Forsythe tests in ANOVA is often to 
determine whether extremely large differences in the error variances exist. For this purpose, 
a low о level may be employed since only large differences in the error variances need to 
be detected. 


We shall describe the Hartley test in general terms. The test considers r normal populations; 
the variance of the ith population is denoted by o7. Independent samples of equal size are 
selected from the r populations; the sample variance for the ith population is denoted by 
s? and the common number of degrees of freedom associated with each sample variance is 
denoted by df. 'The alternatives to be tested are: 


Hy: of =o =... = 0} 
І (18.7) 


Н,: not all o? are equal 


The Hartley test statistic, denoted by H*. is based solely on the largest sample variance, 
denoted by max(s?). and the smallest sample variance, denoted by min(s?): 
max (5?) 


* 


^ min (s?) (16:8) 


Values of H* near І support Му, and large values of H* support Ha. The distribution of H* 
when Hy holds has been tabulated, and selected percentiles are presented in Tablg B.10. 
The distribution of H* depends on the number of populations r and the common number 
of degrees of freedom df. 

The appropriate decision rule for controlling the risk of making a Type I error at @ is: 


If H* < H(1 — о; r, df), conclude Ho 
If H* > H(1 ~ æ; r, df), conclude H, 


(18.9) 


where Н(1— е; ғ, df) is the (1—0) 100 percentile of the distribution of H* when Ho holds, 
for r populations and df degrees of freedom for each sample variance. 

When the Hartley test is used for the single-factor ANOVA model (16.2) with equal 
sample sizes, n; = n, we have df = n — 1. The ғ normal populations are the normal 
probability distributions of the Y observations for the r factor levels. The sample variance 


Example 


TABLE 18.2 
Solder Joint 
Pull 
Strengths— 
ABT 


Chapter 18 ANOVA Diagnostics and Remedial Measures 783 


А А al А А А А 
52 is the variance of thé n; observations Y; j for the ith factor level or equivalently the variance 
of the n; residuals e;;, defined in (16.39); for n; = n, s? becomes: 


аби Yn 


when n; =n (18.10) 


п—1 n—i 


The ABT Electronics Corporation performed an experiment to evaluate five types of flux for 
use in soldering printed circuit boards. A major concern of the firm's reliability engineers 
was the strength of the soldered joints. To test the five types of flux, 40 printed circuit 
boards were selected at random. Each of the five flux types was randomly assigned to 8 
of the 40 circuit boards and an electronic switch was soldered to each board using the 
designated flux type. Following a four-week storage period, the 40 circuit boards were 
tested by an hydraulically operated testing machine which exerted increasing pulling force 
on each switch. The force (in pounds) required to break a joint, termed the pull strength, 
is the response of interest. This design is a completely randomized design, with eight 
replicates of the five treatments corresponding to the five levels of the categorical factor, 
flux type. 

A portion of the observed pull strengths in the experiment is shown in Table 18.2, along 
with the estimated treatment means Y;. and sample variances s?. A dot plot of these data 
is presented in Figure 18.6. Notice that the variability in pull strengths for the third solder 
type appears to be larger than for the others. 

Since approximate normality is required by the Hartley test, normal probability plots of 
the residuals were first constructed for each treatment (not shown). The approximate nor- 
mality of the residuals for each treatment was supported by the plots and by the correlation 
test (the correlations in the five plots are .982, .981, .977, .958, and .939; the critical value 
for œ = .05 from Table В.б is .906). 

The alternatives for the Hartley test here are: 


Wi MS 22.058 
Ho: ор = о; =-= 05 


Ha: not all c? are equal 


Joint Flux Type (i) 

j i=] {= 2.° i=3 i=4 i=5 
1 14.87 18.43 16.95 8.59 11.55 

2 16.81 18.76, 12.28 10.90 13.36 
7 17.40 17.16 19.35 9.41. 12.05 

8 14.62 16.40 15.52 10.04 11.95 

A 15.420 18.528 15.004 9.741 12.340 

y; 15.170 18.595 15.255 10.010 12.105 m 


s? 1.531 1.570 6.183 667 592 


784 Part Four 


FIGURE 18.6 
Dot Plots of 
Pull 
Strengths— 
ABT 
Electronics 
Example. 


Design and Analysis of Single-Factor Studies 


Type 5 


Type4 


Type3 


Type 2 


Type 1 


0 10 20 30 
Pull Strength 


For level of significance о = .05, r = 5, and df = 8 — 1 = 7, we require H (.95;5, 7) = 
9.70. Hence the appropriate decision rule is: 

If H* < 9.70, conclude Ho 

If H* > 9.70, conclude H, 


From Table 18.2 we see that max(s?) = 6.183 and min(s?) = .592. Hence the test statistic is: 


Since H* = 10.44 > 9.70, we conclude H,, that the five treatment variances are not equal. 


Comments 

1. The Hartley test strictly requires equal sample sizes. If the sample sizes are unequal but donot 
differ greatly, the Hartley test may still be used as an approximate test. For this purpose, the average 
number of degrees of freedom would be used for entering Table B.10. 

2. The Hartley test is quite sensitive to departures from the assumption of normal populations and 
should not be used when substantial departures from normality exist. 


Brown-Forsythe Test 


The Brown-Forsythe test for constancy of the error variance in regression was discussed in 
Chapter 3. The test was originally developed for use in ANOVA applications and is more 
general than its use for regression described in Chapter 3. The Brown-Forsythe test, just 
like the Hartley test, can be used to study the equality of 7 population variances. Unlike 
the Hartley test, the Brown-Forsythe test is robust against departures from normality, which 
often occur together with unequal variances. Also, the Brown-Forsythe test does not require 
equal sample sizes. 

To test the alternatives in (18.7) using the Brown-Forsythe test, we first compute the 
absolute deviations of the Y;; observations about their respective factor level medians Үг: 


diy = |Y; — Y; (18.11) 


А 
Me 


Example 


TABLE 18.3 
Absolute 
Deviations of 
Responses 
from 
Treatment 
Medians— 
ABT Electron- 
ics Example. 


" Chapter 18 ANOVA Diagnostics and Remedial Measures 785 
The Brown-Forsythe test then determines whether or not the expected values of the absolute 
deviations for ће r treatments are equal. If the ғ error variances o? are equal, so will the 
expected values of the absolute deviations be equal. Unequal error variances imply differing 
expected values of the absolute deviations. The Brown-Forsythe test statistic is simply the 
ordinary F* statistic in (16.55) for testing differences in the treatment means, but now based 
on the absolute deviations d;; in (18.11): 


Ек SE (18.12) 
where: 

MSTR — ENG (18.122) 

MSE — а 4 (18.12) 

di. = = (18.12с) 

d.. = 2 (18.124) 


If the error terms have constant variance and the factor level sample sizes are not 
extremely small, F7. follows approximately an F distribution with r — 1 and ит — r 
degrees of freedom. Large Fj, values indicate that the error terms do not have constant 
variance. 


Table 18.2 for the ABT Electronics Corporation example provides the sample medians Ў, 
for the five treatments. The absolute deviations d;; in (18.11) are shown in Table 18.3. We 
illustrate their calculation for d: 


dy = |Yu — Yi| = [14.87 — 15.170] = .300 


The F7; test statistic (18.12) based on the absolute deviations is obtained in the usual manner; 
it is Fg, = 2.94. For a = .05, we require F(.95; 4, 35) = 2.64. Since F7, = 2.94 > 2.64, 
we conclude H,, that the error terms do not have constant variance. The P-value for this 
test is .034. 


Joint Flux Type (i) 

j i=1 i=2 i=3 i=4 i=5 
1 .300 .165 1.695 1.420 555 
2 1.640 165 2.975 890 1.255 
7 2.230 1.435 4.095 .600 .055 
8 .550 2.195 .265 .030 155 


786 PartFour 


Design and Analysis of Single-Factor Studies 


18.3 Overview of Remedial Measures 


In the remainder of this chapter, we consider three remedial measures for two common de. 
partures from ANOVA model (16.2)—nonconstancy of the error variance and nonnormality 
of the distribution of the error terms. 


1. If the error terms are normally distributed but the vartance of the error terms js not 
constant, a standard remedial measure is to use weighted least squares. We have already 
considered weighted least squares for nonconstancy of the error variance in regression mod- 
els. These weighted least squares procedures for regression carry over directly to analysis 
of vartance models. 

2. Often, nonconstancy of the error variance is accompanied by nonnormality of the error 
term distribution. A standard remedial measure here is to transform the response variable 
Y. We shall present two approaches to finding an appropriate transformation to make the 
error distribution more nearly normal and to help stabilize the variance of the error terms— 
some simple guides and the Box-Cox procedure. The latter was considered in Chapter 3 for 
regression models and ts directly applicable to analysis of variance models. 

3. When there are major departures from ANOVA model (16.2) and transformations are 
not successful in stabilizing the error variance and bringing the error distribution close to 
normal, a nonparametric test for the equality of the factor level means may be used instead 
of the standard F test. We shall consider a nonparametric test that is based on the ranks of 
the Y observations. 


We begin our discussion of remedial measures with weighted least squares. 


18.4 Weighted Least Squares 


When the errors €;; are normally distributed but their variances are not the same for the 
different factor levels, cell means model (16.2) becomes: 


Y; = pi + ё, (18.13) 


where £;; are independent N (0. сг). 

Weighted least squares is a standard remedial measure here, just as for the comparable 
situation in regression. In fact, we shall use the regression approach to the analysis of 
variance for implementing weighted least squares. АП of the earlier discussion on weighted 
least squares for regression ts applicable to the analysts of variance. 

Since the factor level variances o? are usually unknown, they must be estimated. This is 
ordinarily done by means of the sample variances 52 in (16.39), in which case the weight 
wi; for the jth case of the ith factor level is: 


(18.14) 


The test for the equality of the factor level means in (16.54) is now conducted by the 
general lineartest approach described in Chapter2. The full model is fitted, using the weights 
in (18.14), and the error sum of squares is obtained, now denoted by SSE, (F). Next the 
reduced model under Hp is fitted and the error sum of squares SSE, (R) is obtained. Test 


Example 


Chapter 18 ANOVA Diagnostics and Remedial Measures 787 


x 


statistic (2.70) is employed, as usual. We shall see that df, = nr — r and dfg = nr — 1. 
Hence, the general linear test statistic here is: 


z SSE,(R)— SSE (F) | SSE, (F) 
g r—i ` npr 


E; (18.15) 
Since the weights are based on the estimated variances 52, the distribution of F* under Ho is 
only approximately an F distribution with r — 1 and пт — r degrees of freedom. When the 
factor level sample sizes are reasonably large, the approximation generally is satisfactory. 
As explained in Chapter 11, bootstrapping can be employed to examine the effect of using 
estimated weights. 


Recall in the ABT Electronics example that the normality assumption appears to be rea- 
sonably well supported by the data, but the error variance is not constant. Weighted least 
squares will now be used to test the alternatives: 


Ho: ш = ua = ++ = ps 


(18.16) 
Ha: not all д; are equal 
The weights will be based on the sample variances in Table 18.2: 
йыт Ae) m 659 TN 
B TESTO ЕТА 3) = 6183 —` 
1 1 
= — = 1.499 ‚== —— = 1.68 
ДЫК: Pui sop 09 
We shall use regression model (16.85) to represent cell means model (18.13): 
Y;j = ш Хі + ш Хо +--+ BsXigs + €ij Full model (18.17) 


1 ifcase from factor level 1 
X [= . 
O otherwise 


X= 1 if case from factor level 5 
357 10 otherwise 


Note that the factor level means y; play the role of regression coefficients and that the 
regression model has no intercept. 

Table 18.4 repeats from Table 18.2 a portion of the experimental data in column 1 and 
contains the coding of the indicator variables in columns 2-6 and the weights in column 
7. Note, for instance, that the coding for cases from the first treatment is Х = 1, Хә = 0, 
Хз = 0, X4 = 0, and X5 = 0, and similarly for cases from the other treatments. 

Figure 18.7a contains the MINITAB output when Y in column 1 of Table 18.4 is regressed 
on Хү, X5, Хз, Ха, and X5 in columns 2-6, using the weights in column 7 and specifying 
no intercept. We see that SSE, (F) = 35.0. 

The reduced model under Ho is given by (16.86): 


.` 


Y; = uc +£; Reduced model (18.18) 


788 PartFour Design and Analysis of Single-Factor Studies 


FIGURE 18.7 
MINITAB 
Weighted 
Regression 
Output for Full 
and Reduced 
Models—ABT 
Electronics 
Example. 


TABLE 18.4 Data for Weighted Least Squares Regression—A BT Electronics Examp 


i j 
1 1 
1 2 
i 7 
1 8 
2 1 
2 2 
5 7 
5 8 


The regression equation is 
Y = 15.4 X1 + 18.5 X2 + 15.0 X3 + 9.74 X4 + 12.3 X5 


Predictor 
Noconstant 


Coef 


15.4200 
18.5275 
15.0037 

9.7413 
12.3400 


Analysis of Variance 


SOURCE. 
Regression 
Error 

Total 


DF 

5 
35 
40 


The regression equation is 


Ү= 129 X 


Predictor 
Noconstant 
X 


Coef 


12.8764 


Analysis of Variance 


SOURCE 
Regression 
Error 
Total 


DF 

1 
39 
40 


Stdev 


0.4375 
0.4430 
0.8785 
0.2888 
0.2721 


55 
6478.7 
35.0 
6513.7 


(b) Reduced Model 


Stdev 


0.4981 


55 
6154.5 
359.2 
6513.7 


t-ratio 


35.24 
41.82 
17.08 
33.73 
45.36 


MS 
1295.7 
1.0 


6154.5 


0 о o 0 O © 
Full. Model 

Yj Xn Xy. Xy Хм Xis 
1487 1 0 0 0 0 
16.81 1 0 0 0 .0 
740 1 0 о 0 о 
14.62 1 0 0 0 0 
1843 0 1 0 0 0 
1876 0 1 0 0 0 
1205 ò ó о о 1 
11.95 0 0 0 0 1 

(а) Full Model 


NU 
Weights 
Wij 
653 
653 
653 
.637 
637 


1.689 
1.689 


1295.56 


(8) 
Reduced M 
X H j 


p 
0.000 


p 
0.000 


Chapter 18 ANOVA Diagnostics and Remedial Measures 789 


where ue is the comnion mean response under Ho. The corresponding regression model is: 
Ун = ucXij + ё (18.19) 


where X;; = 1. Note that regression model (18.19) has no intercept. 

* The new X variable is shown in Table 18.4, column 8. Regressing Y in column 1 on X in 
column 8, using the weights in column 7 and specifying no intercept, leads to the MINITAB 
output in Figure 18.7b. We see that SSE, (R) = 359.2. We have ny — 1 = 40— 1 = 39 and 


? nr —r = 40 — 5 = 35. Hence, test statistic (18.15) is: 
359.2 — 35.0 35.0 
= + = 81.05 
he 39 — 35 35 
Fora = .01, werequire F (.99; 4,35) = 3.908. Since F* = 81.05 > 3.908, the approximate 
$ F test leads to conclusion H4, that the factor level means differ. The approximate P-value 


of the test is 0+. 


Кым. Comments 


1. The weighted least squares estimates of the factor level means ш; are always the estimated factor 
level means Y;., as may be seen by comparing the estimated regression coefficients in Figure 18.7a 
with the estimated factor level means in Table 18.2. Hence, for ANOVA model (18.13), the weighted 
and ordinary least squares estimates of the factor level means ju; are the same. 

2. When the sample variances s? are used as weights, the error sum of squares for the fit of full 
model (18.17) will always be SSE,(F) = nr — r. Note that in our example SSE,(F) = 35.0 and 
ny —r —40— 5 = 35. 

3. Some analysis of variance computer packages have an option for weighted least squares, with 
the user specifying the weights. 


18.5 Transformations of Response Variable 


When both the model assumptions of constancy of the error variance and normality of the 
error distributions are violated, a transformation of the response variable is often useful. 
We describe now two approaches to finding a useful transformation—some simple guides 
and the Box-Cox procedure. 


Simple Guides to Finding a Transformation 
The following are four simple guides to finding a useful transformation. The guides were 
developed from theoretical considerations to stabilize the error variances, but these trans- 
formations often also are helpful in bringing the distribution of the error terms more closely 
to a normal distribution. 


Variance Proportional to u;. When the variance of the error terms for each factor level 
(denoted by c7) is proportional to the factor level mean ju, a square root transformation is 
helpful: 


If o? proportional to шг: Y —-JY o Ү'=МҮ+МҮ+Ї1 (18.29) 


$:790 PartFour Design and Analysis of Single-Factor Studies 


xample 


This type of situation ts often found when the observed variable Y is a count, such 
number of attempts by a subject before the correct solution is found. 


аз the 


Standard Deviation Proportional to и;. When the standard deviation of the error term 
: 2 T E S 
for each factor level is proporttonal to the factor level mean, a helpful transformation is the 
logarithmic transformation: 


If o; proportional to £4: Y'= log Y (18.21) 


Standard Deviation Proportional to 42. When the error term standard deviation is pro- 
porttonal to the square of the factor level mean for the different factor levels, an appropriate 
transformation ts the reciprocal transformation: 


x 2 Pol 1 
If c; proportional to рг: Y= Y (18.22) 


Response Is a Proportion. At times, the observed variable Yj; is a proportion руу. For 
instance, the treatments may be different training procedures, the unit of observation is а 
company training class, and the observed variable Y;; is the proportion of employees in the 
Jth class for the ith training procedure who benefited substantially by the training. Note 
that n; here refers to the number of classes receiving the ith training procedure, not to (he 
number of students. 

It is well known that for the binomial distribution the variance of the sample proportion 
depends on the true proportion. When the number of cases on which each sample proportion 
is based is the same, this variance is: 

T; (1 — л) 


c^ (pij) RUPEE (18.23) 


т 


Here л; denotes the population proportion for the ith treatment and m is the соттоп number 
of cases on which each sample proportion is based. Since o*{ p;;} depends on the treatment 
proportion л;. the variances of the error terms will not be stable if the treatment proportions 
7t, differ. An appropriate transformation for this case is the arcsine transformation: 


If response is a proportion: Y' = 2 arcsin VY (18.24) 


When the proportions pj, are based on different numbers of cases (for instance, in our 
earlier illustration there may be different numbers of employees in each training class), 
transformation (18.24) should be employed together with a weighted least squares analysis 
as described in Section 18.4. The use of the arcsin transformation when the responge ts а 
proportion can be an effective, yet simple, remedial measure. A more rigorous approach 
would involve the use of logistic regression as discussed in Chapter 14. 


Use of Simple Guides. То examine whether one of the simple transformation guides ts 
applicable, the statistics s? /Y;., s;/ Y;., and s;/Y7 should be calculated for each factor level, 
where s7 is the sample variance of the Y observations for the ith factor level. defined in 
(16.39), Approximate constancy of one of the three statistics over all factor levels would 
suggest the corresponding transformation as useful for stabilizing the error variance and 
making the error distributions more nearly normal. 


Servo-Data, Inc., operates mainframe computers at three different locations. The computers 
are identical as to make and model, but are subject to different degrees of voltage fluctuation 


ЕЕ.18.5 
ebetween 
uter 
ures at 
ions Gn 
браќа 


"2 


> 


Chapter 18 ANOVA Diagnostics and Remedial Measures 791 


Location) —— 


allure ES = 
Interval cane 
j LT Rij 
1 441-7 2 
2 100:65 ` 13 
3 14.45 °з 6 
4 4743: '.9 
5 8521 .12 
ч 
i Y = s i Ri: s? 
1 50.4 © 1,789 1 84 20.3. 
2 22,1 з 1,103 2 4.8 14.2 
3 121.2 16,167 à 108 12:7 
Y.. =64.6- R...—8.00 


in the power lines serving the respective installations. Table 18.5 contains the lengths of 
time between computer failures Y;; for the three locations, for five failure intervals each. 
The table also contains the.ranks R;; (from 1 to 15) for Y;;, which we shall use in Section 
18.7 for nonparametric analysis. Even though the sample sizes are small, the data suggest 
highly skewed distributions having nonconstant error variance. This is an observational 
study because no randomization of treatments to experimental units occurred. 

To study whether one of the simple guides is helpful here, we have calculated the 
following statistics based on the results in Table 18.5. 


s? 5i 5 
i Y. Y. Y2 
1 35.5 .84 .017 
2 49.9 1.50 .068 
3 133.4 1.05 .009 


The relation s;/ Y;. is the most stable, hence the logarithmic transformation (18.21) may 
be helpful here. We shall continue this example after discussing the use of the Box-Cox 
procedure for finding an appropriate transformation in the analysis of variance. 


Box-Cox Procedure 


The Box-Cox transformation procedure was described in Chapter 3 for regression. As noted 
there, the Box-Cox procedure identifies a power transformation of the type Y^ to correct 
for both lack of normality and nonconstancy of the error variance. The procedure is entirely 
applicable to the analysis of variance. As for regression, the numerical search procedure for 
ANOVA models considers different values of the parameter A. For each value of A, thes¥ 
Observations are transformed according to (3.36) and ANOVA model (16.2) is fitted and the 


792 Part Four 


Example 


TABLE 18.6 
Calculations 
for Box-Cox 
Procedure— 
Servo-Data 
Example. 


FIGURE 18.8 
Normal 
Probability 
Plots for 
Original and 
Transformed 
Data—Servo- 
Data 
Example. 


Design and Analysis of Single-Factor Studies 


error sum of squares SSE is obtained. The value of А that minimizes SSE is the ma: 
likelihood estimate of A. As we saw in regression, SSE as a function of Л is ofte 
the neighborhood of the maximum likelihood estimate А, so that a meaningful 
in the neighborhood may be chosen for the transformation in preference to the 
likelihood value. 


Xitum 
n flat in: 
Value оё); 


The Box-Cox procedure was applied in the Servo-Data example of Table 18.5 by using 2]: 
equally spaced values of A between —1 and 1. For each value of А, the Y observations: 
were transformed according to (3.36) and SSE for ANOVA model (16.2) was calculated, 

A portion of the results is shown in Table 18.6. The smallest SSE is obtained with 4 = A 
However, note that SSE does not change much between —.10 and .20. Hence, the parameter. 
А = 0 may be preferred because it leads to the meaningful logarithmic transformation. This . 
is also the transformation selected according to the simple guides. Normal probability plots 

of the residuals for the original and transformed data (Y = log, Y) are shown in Figure 18.8. 

The normality assumption appears to be much more reasonable for the transformed data 

(r = .991). Also, the variances of the transformed data are much more stable now 

(s? = 1.742, 52 = 1.974, 52 = .817) as compared to the variances for the original data in 

Table 18.5. 


SSE.’ SSE 
A (in thousands) A (in thousands) 
—1.0 203.7 .10 15.3 
—.80 95.1 .20 15.6 
—.60 48.7 .40 18.7 
—.40 28.3 .60 26.4 
—.20 19.2 .80 42.6 
—.10 17.0 1.0 76.2 
.00 15.7 


(a) Original Data 


(b) Transformed Data (Y' = log, Y) 


E] " 
2 e 
© 
© 
© w 1 © 
zi 3 
Ke] 5 
€ E "M 
& & 0 КС 
e? 
—1 ы 
—2 “есы ый» L — 
=2 —1 0 1 2 


Expected Vaiue Expected Value 


А 


Chapter 18 ANOVA Diagnostics and Remedial Measures 793 


A single factor ANOVA was performed on Y’, the logarithm of the Y observations. The 
resulting F test for equality of treatment means was: 
. MSTR _ 5.7264 
~ MSE 1.5112 


For a = .10, we require F(.90; 2, 12) = 2.81. Since F* = 3.789 > 2.81, we conclude H,, 
that the three means are not equal. The P-value of the test is .053. The transformed means 
for the three groups are 3.413, 2.797, and 4.437, respectively. The Bonferroni pairwise 
comparison procedure was then conducted at the .10 level, with s2(D) = .6045, s{D} = 
7775, B = 1(.9833; 12) = 2.402, and Bs{D} = 1.868. The resulting 90 percent Bonferroni 
pairwise confidence intervals are: 


—2.984 < u2 — ш € 752 
—.884 < из — uı x 2.892 
272 < из — u2 < 4.008 


ж 


= 3.789 


Therefore, we conclude that location 3 has longer average time computer failures than 
location 2. 


Comments 


1. It is wise policy, as mentioned for regression, to check the residuals after a transformation has 
been applied to make sure that the transformation has been effective in both stabilizing the variances 
and making the distribution of the error terms reasonably normal. 

2. When a transformation of the observations is required, one can work completely with the 
transformed data for testing the equality of factor level means. On the other hand, it is often desirable 
when making estimates of factor level effects to change a confidence interval based on the transformed 
variable back to an interval in the original variable for easier understanding of the significance of the 
results. 

3. The variance stabilizing transformations (18.20), (18.21), (18.22), and (18.24) are obtained by 
using a Taylor series expansion for the variance of Y. An explanation of the approach may be found 
in Reference 18.3. a 


18.6 Effects of Departures from Model 


In preceding sections, we considered how residual analysis and other diagnostic techniques 
can be helpful in assessing the appropriateness of the ANOVA model for the data at hand. 
We also discussed the use of transformations for both stabilizing the variance and obtaining 
an error distribution more nearly normal. The question now arises: what are the effects of 
any remaining departures from the model on the inferences made? A thorough review of 
the many studies investigating these effects has been made by Scheffé (Ref. 18.4). Here, 
we summarize the findings. 


Nonnormality 


For the fixed ANOVA model I, lack of normality is not an important matter, provided the 
departure from normality is not extreme. It may be noted in this connection that kurtosis 
of the error distribution (either more or less peaked than a normal distribution) is niore 
important than skewness of the distribution in terms of the effects on inferences, 


794 Part Four 


Design aud Analysis of Single-Facior Sindies 


The point estimators of factor level means and contrasts are unbiased whether or NOt the 
populations are normal. The F test for the equality of factor level means ts but little affected 
by lack of normality, either in terms of the level of significance or power of the tesi. Hence 
the F test is a robust test against departures from normality. For instance, while the specified 
level of significance might be .05, the actual level for a nonnormal error distribution misht 
be .04 or .065. Typically. the achieved level of significance in the presence of nonnormali 
is slightly higher than the specified one, and the achieved power of the test is Slightly legg 
than the calculated one. Single interval estimates of factor level means and contrasts and 
the Scheffé multiple comparison procedure also are not much affected by lack of normality, 
provided that the sample sizes are not extremely small. 

For the random ANOVA model П (to be discussed in Chapter 25), lack of normal ity has 
more serious implications. The estimators of the variance components are still unbiased, 
but the actual confidence coefficient for interval estimates may be substantially different 
from the specified one. 


Unequal Error Variances 


When the error variances are unequal, the F test for the equality of means with the fixed 
ANOVA model is only slightly affected if all factor level sample sizes are equal or do not 
differ greatly. Specifically, unequal error variances then raise the actual level of signifi- 
cance slightly higher than the specified level. Similarly, the Scheffé multiple comparison 
procedure based on the F distribution is not affected to any substantial extent by unequal 
variances when the sample sizes are equal or are approximately the same. Thus, the F test 
and related analyses are robust against unequal variances when the sample sizes are ap- 
proximately equal. Single comparisons between factor level means, on the other hand, can 
be substantially affected by unequal variances, so that the actual and specified confidence 
coefficients may differ markedly in these cases. 

The use of equal sample sizes for all factor levels not only tends to minimize the effects 
of unequal variances on inferences with the F distribution but also simplifies calculational 
procedures. Thus, here at least, simplicity and robustness go hand in hand. 

For the random ANOVA model П. unequal error variances can have pronounced effects 
on inferences about the vartance components, even with equal sample sizes. 


Nonindependence of Error Terms 


Lack of independence of the error terms can have serious effects on inferences in the 
analysis of variance, for both fixed and random ANOVA models. Since this defect is often 
difficult to correct, it is important to prevent it in the first place whenever feasible. The 
use of randomization in those stages of a study that are likely to lead to correlated error 
terms сап be a most important insurance policy. 10 the case of observational data, however, 
randomization may not be possible. Here, in the presence of correlated error terms, it may 
be possible to modify the model. For instance, in the earlier discussion based on Figure 18.3, 
we noted that inclusion in the model of a linear term for the learning effect of the analyst 
might remove the correlation of the error terms, y 
Modification of the model because of correlated error terms may also be necessary V! 
experimental studies. In one case, the experimenter asked each of 10 subjects to give ratings 
to four new flavors of a fruit syrup and to the standard flavor, on a scale from 0 to 100. 
When the single-factor analysis of variance model was applied, the experimenter found 


Eum st 


Nonparametric Rank F Test 


Chapter 18 ANOVA Diagnostics and Remedial Measures 795 


high degrees of correlation in the residuals for each subject. The experimenter thereupon 
modified the model to a repeated measures design model (Chapter 27). As described in 
Chapter 15, this latter type of model is intended for situations where the same subject is 
given each of the different treatments and differences between subjects are expected. 


When transformations are not successful in bringing the distributions of the error terms close 
enough to normality to meet the robustness properties of the standard inference procedures, а 
nonparametric inference procedure can be useful. Nonparametric procedures do not depend 
on the distribution of the error terms; often the only requirement is that the distribution is 
continuous. The nonparametric procedure considered here assumes that the r populations 
under study are continuous distributions that differ only with respect to location. Thus it 
provides a test for differences in population means or medians, assuming that the shapes of 
the populations (i.e., variances, skewness, kurtosis, etc.) are identical. 

The test procedure is very simple. All пт observations are ranked from 1 to ny in 
ascending order. Then, the usual F* test statistic in (16.55) is calculated, but now based on 
the ranks, and the F test is carried out in the ordinary manner. 


Test Procedure 


The Y;; observations first need to be ranked in ascending order from 1 to n7. We shall let 
R;; denote the rank of Y;;. In the case of ties among some observations, each of the tied 
observations is given the mean of the ranks involved. For instance, if two observations are 
tied for what would otherwise have been the third- and fourth-ranked positions, each would 
be given the mean value 3.5. 

To test whether the treatment means are equal, the usual F* test statistic is obtained 
based on the ranks А;;. This test statistic is now denoted by FR: 


MSTR 
Fe = 


t= ee (18.25) 
where: 
" р. — р 2 
MSTR = мо RM (18.25а) 
;; — R. 2 
MSE — LINER (18.25b) 
Ит — 
_ Ri; 
R;. = 2y Ry (18.25с) 
nj 
Ре Р.. 
Во E D (18.254) 
пт 


Note that R.., the overall mean of the ranks, is a constant for any given total number of 

Cases ит. [M 
When the treatment means are the same, test statistic Ғе follows approximately the 

F(r — 1, nz — r) distribution provided that the sample sizes n; are not very small. To test 


796 PartFour 


Example 


Design aud Analysis of Single- Factor Studies 


the alternatives: 


Ho: да = рэ = = и. 
Н: not all џи; are equal (18.263) 
the appropriate decision rule to control the Type І error at o is: 
If Fy € Fd —oir 1. пт — г), conclude Ho 
If FR > Е = ог 1, пу — г), conclude H, (18.26b) 


In the Servo-Data example of Table 18.5, we noted earlier that the logarithmic transforma. 
tion of Y improves considerably the appropriateness of the assumptions of normality and 
constancy of the error variance. If the search for a transformation of Y had not been suc- 
cessful, or as an alternative to the transformation approach, we could use the nonparametric 
rank F test. To use this test, we first rank the data in Table 18.5 from 1 to 15. The ranks 
are shown in Table 18.5. Note, incidentally, from Table 18.5 that the rank transformation 
has helped to stabilize the variances of the transformed observations (1.е., the ranks) for the 
three treatments. We now calculate SSTR and SSE as follows: 


SSTR = S[(8.4 — 8.0) + (4.8 — 8.0 + (10.8 — 8.07] = 91.20 
SSE = (2 — 8.4 + (13 — 8.4)? +--+ -(8— 10.8)? = 188.80 


Note that the overall mean R.. here is (nz + 1)/2 = (15 + D/2 = 8.0. The Ер test 

statistic is therefore: 
„ 91.20 188.8 
Ер => + = 
3—1 15 — 3 
For o = .10, we require F(.90; 2, 12) = 2.81. Since Fp = 2.90 > 2.81, we conclude H,. 
The P-value of the test is .094. 

Recall that when we conducted the standard F test based on the logarithmic transfor- 
mation of Y, which was suggested both by the simple guides and the Box-Cox procedure, 
we found that it leads to the same conclusion here; but its P-value—.053—is considerably 
smaller. Thus, both tests show that the mean time between computer failures differs for the 
three locations. 


Comment 


" ‹ " М z * n 
The Kruskal-Wallis test (Ref. 18.5). a widely used nonparametric test for testing the equality of 
treatmem means. is based on a test statistic that is equivalent to the rank F test statistic. The Kruskal- 
2 


Wallis test statistic, denoted by X74. is also based on the ranks A;; from 1 to тг and is defined as 
follows: 


Х = (18.27) 


wherc: 
SSTO — 5 Ук, cR (18.272) 


Instead of using the F distribution approximation. the Kruskal-Wallis test uses a chi-square distribution 
approximation. If the n; are reasonably large (буе or more is the usual advice), X 74, 15 approximately 
ax? random variable with r — | degrees of freedom when all treaument means are equal. The decision 


Chapter 18 ANOVA Diagnostics and Remedial Measures 797 


rule therefore is: 
If Хр < x*(1 — о; г — 1), conclude Но 


(18.28) 
If XZ > x? — o; т — 1), conclude Н, 
The Fj and X%,, test statistics are equivalent, being related as follows: 
— rx 
к= бт Des. (18.29) 
(r —1)@r —1— Хк) d 


_ultiple Pairwise Testing Früeedure 


s 


ё 


Example 


If the rank F test (or the Kruskal-Wallis test) leads to the conclusion that the factor level 
means u; are not equal, it is frequently desired to obtain information about the comparative 
magnitudes of these means based on the ranked data. A large-sample testing analogue of the 
Bonferroni pairwise comparison procedure discussed in Section 17.7, based on the ranks 
of the observations, may be employed for this purpose, provided that the sample sizes are 
not too small. Testing limits for all g = r (r — 1)/2 pairwise tests using the mean ranks R;. 
are set up as follows for family level of significance o: 


1/2 
(R;. — Ry.) E B Е D (= + - ) (18.30) 
where: 
B = z(1 —ef2g) (18.30) 
a = D (18.30b) 


If the testing limits include zero, we conclude that the corresponding treatment means џи; 
and u; do not differ. If the testing limits do not include zero, we conclude that the two 
corresponding treatment means differ. On the basis of all pairwise tests, we then set up 
groups of treatment means whose members do not differ according to the simultaneous 
testing procedure. In this way, we obtain information about the comparative magnitudes of 
the treatment means иу. | 


For the Servo-Data example in Table 18.5, we wish to ascertain, if possible, which location 
has the longest mean time between computer failures based on the rank data. For a family 
baee level of œ = .10 and g = r(r — 1)/2 = 3(2)/2 = 3 pairwise tests, we require 

— 2(.9833) — 2.13. Since all treatment sample sizes are equal, we need to calculate the 
F term in (18.30) only once: 


nr(nr +1) /1 i1X]^ 15(16) M 
B = 2.1 = = 6.02 
| 12 G 8 =) 3171 (5 z =) ae 


Hence, the testing limits for the three pairwise tests are: 
Locations 1 and 2: (8.4 — 4.8) + 6.02 or —2.4 and 9.6 
Locations 3 and 2: (10.8 — 4.8) + 6.02 or — .02 and 12.0 
Locations 3 and 1: (10.8 — 8.4) + 6.02 or — 3.6 and 8.4 


798 Part Four 


Design and Analysis of Single-Factor Studies 
Since no test shows a significant difference, we obtain only one grouping: 


Group 1 


Location 1 
Location 2 
Location 3 


Note that zero is just inside the lower boundary of the testing limits for locations 2 and 3. 

Recall that when the Bonferroni pairwise comparison procedure was conducted on the 
logarithm of the responses, we concluded that a significant difference existed between tpe 
means of locations 2 and 3. Thus here, and in general for small sample sizes, the simple 
transformations discussed in Section 18.5 are often preferred to the rank transformation 
because the resulting ANOVA tests are less conservative and tend to have greater statistica] 
power than those associated with the rank transformation. 


18.8 Саве Example—Heart Transplant 


TABLE 18.7 
Survival Times 
of Patients 
Following 
Heart 
Transplant 
Surgery— 
Heart 
Transplant 
Example. 


In heart transplant surgery, the similarity of the donor’s tissue type and that of the recipient 
is of importance because large differences may increase the probability that the transplanted 
heart is rejected. Table 18.7 shows a portion of the survival times (in days) obtained from an 
observational study of 39 patients following heart transplant surgery. The data are grouped 
into three categories, according to the degree of mismatch between the donor tissue and the 
recipient tissue. Investigators would like to determine if the mean survival time changes 
with the degree of mismatch. The alternatives to be tested are: 


Ho: ш = шә = из 
Ha: not all ш; are equal 


A $Ү$ТАТ dot plot of the data by mismatch category is provided in Figure 18.9a. The 
plot suggests that average survival time may decrease with higher degree of mismatch. Ал 
initial fit of analysis of variance model (16.2) was made and the studentized residuals were 


Degree of Tissue Mismatch (i) 


Case Low Medium High 

] i=1 i=2 i=3 
1 44 15 3 
2 551 280 136 
3 127 1,024 65 

12 47 836 48 

13 994 51 

14 26 


Source: М. L. Puri and P. К. Sen, Nonparametric Methods in General Linear 
Models (New York: John Wiley & Sons, [985). 


chos E 


x Е Chapter 18 ANOVA Diagnostics and Remedial Measures 799 


СӘКЕ 8.9 SYSTAT Diagnostic Plots—Heart Transplant Example. 


А (а) (b) 
3 Dot Plots of Survival Times Dot Plots of Studentized Residuals 


i 
* 


zMedium 


A- Low 


0 500 1000 1500 -1 0 1 2 3 4 
Survival Time Studentized Residual 


© 
Normal Probability Plot 
of Studentized Residuals 


Studentized Residual 


=2 
—25 —1.5 -05 05 1.5 2.5 
Expected Value 


obtained for diagnostic purposes. Two residual plots are presented in Figures 18.9b and 
18.9c. The dot plot of the studentized residuals in Figure 18.9b shows that the distribution 
of the residuals is positively skewed. It also suggests that the error variance may be smaller 
in the high mismatch group. The Brown-Forsythe test in (18.12) was conducted to examine 
the constancy of the error variance. The Brown-Forsythe test statistic is Fj, = 1.91 and 
the P-value is .163, supporting constancy of the error variance. On the other hand, the 
positive skewness of the residuals is confirmed by the upward-curving shape of the normal 
probability plot in Figure 18.9c and the correlation test for normality (r = .895; fora = .05, 
the interpolated critical value in Table B.6 is .971). 

A transformation of the response variable was therefore investigated. The Box-Cox 
procedure led to the maximum likelihood estimate A = .06, which suggested the logarithmic 
transformation (A = 0). The new response variable Y’ = log, Y was therefore obtained 


800 PartFour Design and Analysis of Single-Factor Studies 


FIGURE 18.10 Diagnostic Plots and ANOVA Table for Transformed Data—Heart Transplant Example. 


(a) (5 
Dot Plots of Normal Probability Plot 
Studentized Residuals of Studentized Residuals 


3 - High E 
© 
& 
` кеј 
2 - Medium g 
t 
w 
5 
1 - Low a 
-3 -2 -1 0 1 2 —2.5 —1.5 —0.5 05 15 25 
Studentized Residual Expected Value 
(9 
ANOVA Table 
DEP VAR: LOGTIME N: 39 MULTIPLE R: 0.282 SQUARED MULTIPLE R: 0.080 
ANALYSIS OF VARIANCE 
SOURCE SUM-OF-SQUARES DF MEAN-SQUARE F-RATIO P 
CATEGORY 7.611 2 3.806 1.560 0.224 
ERROR 87.834 36 2.440 


and ANOVA model (16.2) was fitted to this transformed variable. Two plots of studentized 
residuals are shown in Figure 18.10. A dot plot of the studentized residuals is presented in 
Figure 18.10a. Notice that the distribution of the residuals now appears to be symmetric, 
with constant variance. The normality of the distribution of the error terms is supported 
by the normal probability plot in Figure 18.10b and the correlation test for normality 
(r = .982 > .971). 

The residual dot plot in Figure 18.10a shows the possible presence of an outlier in the 
low tissue mismatch category (studentized residual = —2.99). For this case the studentized 
deleted residual is —3.40. The Bonferroni critical value for the outlier test is f (1—.05/2(39); 
36) = 1(.999359; 36) = 3.49. Since | — 3.40] = 3.40 < 3.49, we conclude that this case 
is not an outlier. 

It therefore appears that the logarithmic transformation was successful so that ANOVA 
model (16.2) is appropriate for the transformed survival times. The ANOVA table for the 
transformed data is shown in Figure 18.10c. We see that F* = 1.56 and that the p-value 
for the test is .224. For a = .10, we therefore conclude Но, that the mean survival time 
for heart transplant patients with the characteristics of those included in the study does not 
depend on the degree of tissue mismatch. 


Chapter 18 ANOVA Diagnostics and Remedial Measures 801 


ed 18.1. 
.eferences iá2 


s 18.3. 


18.4. 
18.5. 


> 


Hartley, Н. О. “Testing the Homogeneity of a Set of Variances,” Biometrika 31 (1940), pp. 249— 
255. 


. Brown, M. B., and-A. B. Forsythe. “Robust Tests for Equality of Variances,” Journal of the 


American Statistical Association 69 (1974), pp. 364-67. 

Snedecor, G. W., and W.G. Cochran. Statistical Methods. 8th ed. Ames, Jowa: Iowa State 
University Press, 1989. 

Scheffé, H. The Analysis of Variance. New York: John Wiley & Sons, 1959. 

Kruskal, W. H., and W. A. Wallis. “Use of Ranks on One-Criterion Variance Analysis,” Journal 
of the American Statistical Association 47 (1952), pp. 583—621 (corrections appear in Vol. 48, 
pp. 907-11). 


: oblems 18.1. Refer to Figures 18.3 and 18.4. What feature of the residual sequence plots enables you to 


i 18.3. 


18.2. 


diagnose that in one case the error variance changes over time whereas in the other case the 
effect is of a different nature? Could you make a diagnosis about time effects from a residual 
dot plot? 

A student proposed in class that deviations of the observations Y;; around the estimated overall 
mean У.. be plotted to assist in evaluating the appropriateness of ANOVA model (16.2). Would 
these deviations be helpful in studying the independence of the error terms? The constancy of 
the variance of the error terins? The normality of the error terms? Discuss. 

А consultant discussing ANOVA applications in a seminar stated: *Sometimes I find that 
treatment effects in an experiment do not show up through differences in the treatment means. 
Hence, it is important to compare the residual plots for the treatments." A member of the 
audience asked: "I don't think I understood your point regarding differences in treatment 
means being explored using residual plots." Discuss. 


*18.4. Refer to Productivity improvement Problem 16.7. 


a. Prepare aligned residual dot plots by factor level. What departures from ANOVA model 
(16.2) can be studied from these plots? What are your findings? 

b. Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla- 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 

C. Obtain the studentized deleted residuals and conduct the Bonferroni outlier test; use œ = 
.01. State the alternatives, decision rule, and conclusion. 

d. The economist wishes to investigate whether location of the firm’s home office is re- 
lated to productivity improvement. The home office locations are as follows (О: U.S.; 
E: Europe): 


~ 


wna 
mmc l= 
С тт № 
momom jw 
Cmmjè 
с ст |л 
mccie 


Prepare aligned residual dot plots by factor level in which the location of the home office is 
identified. Does it appear that ANOVA model (16.2) could be improved by adding location 
of home office as a second factor? Explain. 


* 


802 Part Four 


Design and Analysis of Single-lacior Studies 


18.5. Reler to Questionnaire color Problem 16.8. 


а. 


. The observations within each factor level arc in geographic sequence. Prep 


Prepare aligned residual dot plots by color. Whai departures from ANOVA mode] (162) 
can be studied from these plots? What are your findings? i 
Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla. 
tion between the ordered residuals and thcir expected values under normality, Does the 
normality assumption appear to be reasonable here? 

are residual 
sequence plots. What can be studied from these plots? What are your findings? 


Obtain the studentized deleted residuals and conduci the Bonferroni outlier test use 
о = .025. State the alternatives. decision rule. and conclusion. 


18.6. Refer io Rehabilitation therapy Problem 16.9. 


а. 


Obtain the residuals and prepare aligned residual dot plots by factor level. What departures 
from ANOVA model (16.2) can be studied from these plots? What are your findings? 
Prepare a normal probability plot of the residuals. Also obtain the coefticient of correla- 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 

The observations within each factor level are in time order, Prepare residual sequence plots 
and analyze them. What are your findings? 

Obtain the studentized deleted residuals and conduct the Bonferroni outlier test; use 
a = .01. State the alternatives. decision rule. and conclusion. 


x18.7. Refer t Cash offers Problem 16.10. 


a. 


Obtain the residuals and prepare aligned residual dot plots by factor level. What departures 
from ANOVA model (16.2) can be studied Irom these plots? What are your findings? 
Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla- 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 

The observations within each factor level are in time order. Prepare residual sequence plots 
and interpret them, What are your findings? 

Obtain the studentized deleted residuals and conduct the Bonferroni outlier test; use 
a = .025, State the alternatives, decision rule. and conclusion. 

An executive in the consumer organization has been told that used-car dealers in the region 
tend to make lower cash offers during weekends (Friday evening through Sunday) than at 
other times. The times when offers were obtained are as follows (УУ: weekend; О: other 
time): 


П 
i 1 2 3 4 5 6 7 8 9 10 11 12 
1 O о w О Ww О Ww О W О W W 
2.0 w м О w О w Оо о W W О 
зом О Ww О О 0 WwW Ww Ww О W 


Prepare aligned residual dot plots by factor level in which thc time of the olTer is identified. 
Does it appear that ANOVA model (16.2) could be improved by adding time of offer as а 
second factor? Explain. 


СИ 


318.8. 


18.9. 


18.10. 


x18.11. 


18.12. 


ж18.13. 


Chapter 18 ANOVA Diagnostics and Remedial Measures 803 


Refer to Filling machines Problem 16.11. 


a. Obtain the residuals and prepare aligned residual dot plots by machine. What departures 
from ANOVA model (16.2) can be studied from these plots? What are your findings? 

b. Prepare a normal probability plot of the residuals. Also obtain the coefficient of correlation 
between the ordered residuals and their expected values under normality. Does the normality 
assumption appear to be reasonable here? 

c. The observations within each factor level are in time order. Prepare residual sequence plots 
and interpret them. What are your findings? 

d. Obtain the studentized deleted residuals and conduct the Bonferroni outlier test; use œ = 
.01. State the alternatives, decision rule, and conclusion. 

Refer to Premium distribution Problem 16.12. 


a. Obtain the residuals and prepare aligned residual dot plots by agent. What departures from 
ANOVA model (16.2) can be studied from these plots? What are your findings? 

b. Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla- 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 

c. The observations within each factor level are in time order. Prepare residual sequence plots 
and interpret them. What are your findings? 

d. Obtain the studentized deleted residuals and conduct the Bonferroni outlier test; use 
a = .025. State the alternatives, decision rule, and conclusion. 


Computerized game. Four teams competed in 20 trials of a computerized business game. 
Each trial involved a new game, the objective for each team being to maximize profits in the 
given trial. A researcher fitted ANOVA model (16.2) to determine whether or not the mean 
profits for the four teams are the same and obtained the following residuals: 


i 1 2 3 18 19 20 
1 10 28 ло... ло .28 .28 
2 1.44 1.44 1.32 ... 102 1.18 1.51 
3 .93 .70 81 54 43 65 
4 —.15 11 25 ss 11 25 38 


The residuals for each team are given in time order. Construct appropriate residual! plots to 
study whether the error terms are independent from trial to trial for each team. What are your 
findings? 

Refer to Productivity improvement Problem 16.7. Examine by means of the Brown-Forsythe 
test whether or not the treatment error variances are equal; use a = .05. State the alternatives, 
decision rule, and conclusion. What is the P-value of the test? 

Refer to Rehabilitation therapy Problem 16.9. Examine by means of the Brown-Forsythe 
test whether or not the treatment error variances are equal; use a = .10. State the alternatives, 
decision rule, and conclusion. What is the P-value of the test? 

Refer to Cash offers Problem 16.10. Assume that the error terms are approximately normally 
distributed. 


804 Part Four 


Design aud Analysis of Single-lF'actor Studies 


*18.14. 


18.15. 


18.16. 


a. Examine by means ofthe Hartley test whether or not the treatment error variances a 
use о = .01. State the alternatives, decision rule. and conclusion, Whar is the p 
the test? 


re equal; 
Хаце of 


b. Would you reach the same conclusion us in part (a) with the Brown-Forsythe test? 


Refer to Filling machines Problem 16.11. Assume that the error terms are 


approximates 
normally distributed. y 


а. Examine by means of the Hartley test whether or not the trcatment error variances aree 
use a = .01. State the alternatives, decision гше, and conclusion. What is the P-val 
thc test? 


qual; 
ue of 


b. Would you reach the same conclusion as in part (a) with the Brown-Forsythe test statistic 
Helicopter service. An operations analyst in a sherift"s department studied how frequently 
their emergency helicopter was used during the past year. by time of day (shift 1: 2 A M... 
8 A.M.: shift 2: 8 A.M.-2 P.M.; shift 3: 2 P.M.-8 P.M.: shift 4: 8 PM.-2 A.M), Random 
samples of size 20 l'or each shift were obtained. The data follow (in time order); 


i 
j 1 2 3 18 19 20 
1 4 3 5 4 1 6 
2 0 2 0 2 2 0 
3 2 1 0 0 2 4 
4 5 2 4 5 2 3 


Since the data are counts. the analyst was concerned about the normality and equal variances 
assumptions of ANOVA model (16.2). 


a. Obtain the fitted values and residuals for ANOVA model (16.2). 


b. Prepare suitable residual plots to study whether or not the error variances are equal for the 
four shifts. What are your findings? 

c. Test by means of the Brown-Forsythe test whether or not the treatment error variances are 
equal: use о = .10. What is the P-value of the test? Are your results consistent with the 
diagnosis in part (b)? 

d. For each shift. calculate Y;. and s;. Examine the three relations found in the table on 
page 701 and determine the transformation that is most appropriate here. What do you 
conclude? 

e. Use the Box-Cox procedure to find an appropriate power transformation of Y. first adding 
the constant 1 to each Y observation. Evaluate SSE for the values of 4 given in Table 18.6. 
Docs А = .5. a square-root transformation. appear to be reasonable. based on theBox-Cox 
procedure? 


Retcr to Helicopter service Problem 18.15. The analyst decided to apply the square root 

transformation Y' = VY and examine its effectiveness. 

a. Obtain the transformed response data. fit ANOVA model (16.2). and obtain the residuals. 

b. Prepare suitable plots of the residuals to study the equality of the error variances of the 
transformed response variable for the four shifts. Also obtain a normal probability plot 
and the coeflicient of correlation between thc ordered residuals and their expected values 
under normality. What are your findings? Does the transformation appear to have been 
effective? 

c. Test by means of the Brown-Forsythe test whether or not the treatment error variances 
lor the transformcd response variable are equal: use œ = .10. State the alternatives, 


vb 


x18.17. 


x18.18. 


18.19. 


x18.20. 


Chapter 18 ANOVA Diagnostics and Remedial Measures 805 


decision rulejand conclusion. Are your findings in part (b) consistent with your conclusion 
here? M 
Winding speeds. In a completely randomized design to study the effect of the speed of winding 
thread (1: slow; 2: normal; 3: fast; 4: maximum) onto 75-yard spools, 16 runs of 10,000 spools 
each were made at each of the four winding speeds. The response variable is the number of 
thread breaks during-the production run. The results (in time order) are as follows: 


i 
i 1 2 з .. 14 15 16 
1 4 3 2 2 3 4 
2 7 6 4 . 4 7 6 
3 12 6 14 .. 13 10 14 
4 17 15 7 «ss 19 9 23 


Since the responses are counts, the researcher was concerned about the normality and equal 

variances assumptions of ANOVA model (16.2). 

a. Obtain the fitted values and residuals for ANOVA model (16.2). 

b. Prepare suitable residual plots to study whether or not the error variances are equal for the 
four winding speeds. What are your findings? 

c. Test by means of the Brown-Forsythe test whether or not the treatment error variances are 
equal; use œ = .05. What is the P-value of the test? Are your results consistent with the 
diagnosis in part (b)? 

d. For each winding speed, calculate Y.. and s;. Examine the three relations found in the table 
on page 791 and determine the transformation that is most appropriate here. What do you 
conclude? 

e. Use the Box-Cox procedure to find an appropriate power transformation of Y. Evaluate 
SSE for the values of А. given in Table 18.6. Does à = 0, a logarithmic transformation, 
appear to be reasonable, based on the Box-Cox procedure? 


Refer to Winding speeds Problem 18.17. The researcher decided to apply the logarithmic 

transformation Y’ = 10810 Y and investigate its effectiveness. 

a. Obtain the transformed response data, fit ANOVA model (16.2), and obtain the residuals. 

b. Prepare suitable plots of the residuals to study the equality of the error variances of the 
transformed response variable for the four winding speeds. Also obtain a normal prob- 
ability plot and the coefficient of correlation between the ordered residuals and their 
expected values under normality. What are your findings about the effectiveness of the 
transformation? 

C. Test by means of the Brown-Forsythe test whether or not the treatment error variances 
for the transformed response variable are equal; use œ = .05. State the alternatives. 
decision rule, and conclusion. Are your findings in part (b) consistent with your conclusion 
here? 

Refer to Helicopter service Problem 18.15. Assume that ANOVA model (18.13) is appropri- 

ate. Use weighted least squares with the untransformed data to test for the equality of the shift 

means; control the o risk at .05. State the alternatives, full and reduced regression models, 
decision rule, and conclusion. 

Refer to Winding speeds Problem 18.17. Assume that ANOVA model (18.13) is appropriate. 

Use weighted least squares with the untransformed data to test for the equality of the winding 


806 Part Four 


Design and Analysis of Single-Factor Studies 


18.21. 
18.22. 
*18.23. 


x18.24. 


18.25. 


thread speed means; use о = .01. State the alternatives, full and reduced regression models 
decision rule, and conclusion. : 


Why is the nonparametric rank F test a nonparametric test? 

Explain why the limits in (18.30) are testing limits and not confidence limits. 

Refer to Productivity improvement Problem 16.7. 

a. Conduct the nonparametric rank F test; use œ = .05. State the alternatives, decision rule, 
and conclusion. 

What is the P-value of the test in part (a)? 

Does the conclusion in part (a) differ from the one in Problem 16.7e? 

Do the data suggest that a nonparametric test is needed here? 


бор о = 


Conduct multiple pairwise tests based on the ranked data to group the three types of firms 
according to mean productivity improvement. Use family level of significance a = 10, 
Describe your findings. 

Refer to Cash offers Problem 16.10. 

а. Conduct the nonparametric rank F test; use œ = .01. State the alternatives, decision rule, 
and conclusion. 

What is the P-value of the test in part (a)? 

Does the conclusion in part (a) differ from the one in Problem 16.10e? 


gp ou 


Do the data suggest that a nonparametric test is needed here? 


© 


Conduct multiple pairwise tests based on the ranked data to group the three age categories 
according to mean cash offer. Use family level of significance œ = .10. Describe your 
findings. 


Telephone communications. A management consultant was engaged by a firm to improve 
the cost-effectiveness of its communications. As part of the study, the consultant selected 10 
home-office executives at random from each of the (1) sales, (2) production, and (3) research 
and development divisions, and studied the communications of these executives during the 
past 10 weeks in great detail. Among other data, the consultant obtained the following in- 
formation on weekly dollar costs of long-distance telephone calls to branch offices by the 
executives: 


1 2 3 4 5 6 7 8 9 10 


666 920 495 602 1,499 960 796 343 894 813 
488 362 156 546 216 542 345 291 516 126 
391 450 609 910 705 472 645 496 763 1,309 


- 


UNA 


The consultant decided to employ a nonparametric approach to test whether or not the mean 

telephone expenses for the three divisions are equal. 

a. What feature of the data may have suggested the use of a nonparametric test? 

b. Conduct the nonparametric rank F test, controlling the risk of Type Т error at @ = 05. 
State the alternatives, decision rule, and conclusion. What is the P-value of the test? 

ee divisions 


c. Conduct multiple pairwise tests based on the ranked data to group the tht 5 
е [04 = d 


according to mean telephone expenditures; use family level of significanc 
Describe your findings. 


Er: 


fxercises 


Chapter 18 ANOVA Diagnostics and Remedial Measures 807 


18.26. 


18.27. 
18.28. 


Refer to Figure 18.3. Modify ANOVA model (16.2) to include a linear trend term for the time 
effect. Is this modified model still an ANOVA model? A linear model? 


Show that пт(пт +1) /12 in (18.30) is the sample variance of the consecutive integers 1 to пт. 
Show that test statistics (18.25) and (18.27) are related according to (18.29). 


Proj ects 


18.29. 


18.30. 


18.31. 


Refer to the SENIC data set in Appendix C.1 and Project 16.42. 


a. Obtain the residuals and prepare aligned residual dot plots by region. Are any serious 
departures from ANOVA model (16.2) suggested by your plots? 

b. Obtain anormal probability plot of the residuals and calculate the coefficient of correlation 
between the ordered residuals and their expected values under normality. Is the normality 
assumption reasonable here? 

c. Examine by means of the Brown-Forsythe test whether or not the geographic region error 
variances are equal; use œ = .05. State the alternatives, decision rule, and conclusion. 
What is the P-value of the test? 


Refer to the SENIC data set in Appendix C.1. A test of whether or not mean length of stay 
(variable 2) is the same in the four geographic regions (variable 9) is desired, but concern 
exists about the normality and equal variances assumptions of ANOVA model (16.2). 


a. Obtain the residuals and plot them against the fitted values to study whether or not the error 
variances are equal for the four geographic regions. What are your findings? 


b. For each geographic region, calculate Y;. and s;. Examine the three relations found in the 
table on page 791 and determine the transformation that is the most appropriate one here. 
What do you conclude? 

с. Use the Box-Cox procedure to find an appropriate power transformation of Y. Evaluate 
SSE for the values of A given in Table 18.6. Does А = —1, a reciprocal transformation, 
appear to be reasonable, based on the Box-Cox procedure? 

d. Use the reciprocal transformation Y' — 1/Y to obtain transformed response data. 

e. Fit ANOVA model (16.2) to the transformed data and obtain the residuals. Plot these resid- 
uals against the fitted values to study the equality of the error variances of the transformed 
response variable for the four regions. Also obtain a normal probability plot of the residuals 
and the coefficient of correlation between the ordered residuals and their expected values 
under normality. What are your findings? 

f. Examine by means of the Brown-Forsythe test whether or not the geographic region vari- 
ances for the transformed response variable are equal; use œ = .01. State the alternatives, 
decision rule, and conclusion. What is the P-value of the test? 

&. Assume that ANOVA model (16.2) is appropriate for the transformed response variable. 
Test whether or not the mean length of stay in the transformed units is the same in the 
four geographic regions. Control the o risk at .01. State the alternatives, decision rule, and 
conclusion. What is the P-value of the test? 


Refer to the CDI data set in Appendix C.2 and Project 16.44. 
a. Obtain the residuals and prepare aligned residual dot plots by region. Are any serious 
departures from ANOVA model (16.2) suggested by your plots? А 


b. Obtain a normal probability plot of the residuals and calculate the coefficient of correlation 
between the ordered residuals and their expected values under normality. Is the normality 
assumption reasonable here? 


808 Part Four 


Design and Analysis of Single- Factor Studies 


C. 


Examine by means of the Brown-Forsythe test whether or not the geographic regio 
variances аге equal; use о = ‚01, State the alternatives, decision rule, 
What is the P-value of the test? 


18.32. Refer to the Market share data set in Appendix С.З and Project 16.45. 


18.33. 


18.34. 


18.35. 


18.36. 


18.37. 


a. 


Obtain the residuals and prepare aligned residual dot plots by factor-level combinations, 


Are any serious departures from ANOVA model (16.2) suggested by your plots? 


Obtain a normal probability plot of the residuals and calculate the coefficient of correlation 


between the ordered residuals and their expected values under normality. Is the normality 
assumption reasonable here? 


. Examine by means of the Brown-Forsythe test whether or not the factor level error variances 


are equal; use œ = .05. State the alternatives, decision rule. and conclusion. What is the 
P-value of the test? 


Refer to the SENIC data set in Appendix C.1 and Project 16.42. 


a. 


Use the nonparametric rank F test to determine whether or not the mean infection risk is the 
same in the four regions; control the level of significance at о = .05. State the alternatives, 
decision rule, and conclusion. What is the P-value of the test? 


Is your conclusion in part (a) the same as that obtained in Project 16.42? Is the nonparametric 
test more reasonable here? 

Use the multiple pairwise testing procedure (18.30) to group the regions; employ family 
significance level о = .10. What are your findings? 


Refer to the CDI data set in Appendix C.2 and Project 16.44. 


a. 


Use the nonparametric rank F test to determine whether or not the mean crime rate is the 
same in the four regions; control the level of significance at о = .05. State the alternatives, 
decision rule, and conclusion. What is the P-value of the test? 

Is your conclusion in part (a) the same as that obtained in Project 16.44? Is the nonparametric 
test more reasonable here? 

Use the multiple pairwise testing procedure (18.30) to group the regions; employ family 
significance level о = .05. What are your findings? 


Refer to the Market share data set in Appendix C.3 and Project 16.45. 


a. 


Use the nonparametric rank F test to determine whether or not the mean average monthly 
share is the same for the four factor combinations: control the level of significance at 
a = 05. State the alternatives, decision rule, and conclusion. What is the P-value of the 
test? 

Is your conclusion in part (a) the same as that obtained in Project 16.45? Is the nonparametric 
test more reasonable here? 

Use the multiple pairwise testing procedure (18.30) to group the factor combinations; 
employ family significance level о = .05. What are your findings? 


Obtain the exact sampling distribution of the nonparametric rank Fẹ test statistic in (18.25) 
when Hy, holds, for the case r = 2 and n; = 2. (Hint: What does the equality of the treatment 
means imply about the arrangement of the ranks 1, 2, 3, 4?] 


Three populations are being studied; each is uniform between 300 and 800. 


a. 


b. 


Generate 10 random observations from each of the three uniform populations and calculate 
the FR test statistic (18.25). 


Repeat part (a) 500 times. 


n ertor ` 
апа conclusion: , 


One nee 


Case 


Siudies 


18.38. 


18.39. 


18.40, 


ity Chapter 18 ANOVA Diagnostics and Remedial Measures 809 


с. Calculate the mean and standard deviation of the 500 test statistics. How do these values 
compare with the characteristics of the relevant F distribution? 

d. What proportion of the 500 test statistics obtained in part (b) is less than F(.90; 2, 27)? 
What proportion is less than F (.99; 2,27)? How do these proportions agree with theoretical 
expectations? 


Refer to the Prostate cancer data set in Appendix C.5 and Case Study 16.49. Check to 
see whether concern exists about the assumption of normality and equal variances for the 
ANOVA model that you decided upon in Case Study 16.49. Document the steps taken in your 
assessment of these concerns. Is a transformation indicated here? If yes, what transformation 
is recommended? Why? 

Refer to the Real estate sales data set in Appendix C.7 and Case Study 16.50. Check to 
see whether concern exists about the assumption of normality and equal variances for the 
ANOVA model that you decided upon in Case Study 16.50. Document the steps taken in your 
assessment of these concerns. Is a transformation indicated here? If yes, what transformation 
is recommended? Why? 

Refer to the Ischemic heart disease data set in Appendix C.9 and Case Study 16.51. Check 
to see whether concern exists about the assumption of normality and equal variances for the 
ANOVA model that you decided upon in Case Study 16.51. Document the steps taken in your 
assessment of these concerns. Is a transformation indicated here? If yes, what transformation 
is recommended? Why? 


we 


Part 


i 


ms apri cur 
m Ro eaer oc tty ums omp ote oo» TANE үтс,гт де . vtt 


ма раге ie 7 раа л аре 


Фу "эз у АР рур УТЫ 
[x "n 


Chapter 


'Two-Factor Studies with 
Equal Sample Sizes 


In Part IV, we considered the design and analysis of experimental and observational studies 
in which the effects of one factor are investigated. Now we are concerned with investigations 
of the simultaneous effects of two or more factors, In this chapter, we take up the analysis 
of variance for two-factor studies where the factors are crossed and all sample sizes are 
equal. In Chapters 20, 21, 22, and 23, we continue the discussion of two-factor studies by 
taking up the analysis of factor effects with one case per cell, randomized complete block 
designs, the analysis of covariance, and two-factor studies with unequal sample sizes. In 
Chapter 24, we extend the analysis of variance to studies with three or more factors, Finally, 
in Chapter 25, we take up random and mixed effects models. 


19.1 Two-Factor Observational and Experimental Studies 


Two-factor studies, like single-factor studies, can be based on experimental or observational 
data. We begin with three examples of two-factor studies: the first is an experimental study, 
the second is an observational study, and the third has aspects of both experimental and 
observational studies, * 


Examples of Two-Factor Experiments and Observational Studies 


А company investigated the effects of selling price and type of promotional campaign 
on sales of one of its products. Three selling prices (55 cents, 60 cents, 65 cents) Were 
studied, as were two types of promotional campaigns (radio advertising, newspaper 
advertising). Let us consider selling price to be factor А and promotional campaign Ю 
be factor B. Factor A here was studied at three price levels; in general, we use the Sy 
bol a to denote the number of levels of factor A investigated. Factor B was here studied 
at two levels; we use the symbol b to denote the number of levels of factor B invest- 
gated. Each combination of price and promotional campaign was studied, as shown in the 


Example 1 


812 


FIGURE 19.1 
Experimental 
Layout— 
Example 1. 


Chapter 19  7wo-Factor Studies with Equal Sample Sizes 813 


table below: 


Treatment Description 


55 price, radio advertising 
60 price, radio advertising 
65 price, radio advertising 
55 price, newspaper advertising 
60 price, newspaper advertising 
65 price, newspaper advertising 


Ov CA 4$ UU) м A 


Each combination of a factor level of A and a factor level of B is a treatment. Thus, there 
are 3 x 2 — 6treatments here altogether. In general, the total number of possible treatments 
in a two-factor study is ab. 

Twelve communities throughout the United States, of approximately equal size and sim- 
ilar socioeconomic characteristics, were selected and the treatments were assigned to them 
at random, such that each treatment was given to two experimental units. The experiment 
can be represented by the graph in Figure 19.1. The two experimental units for each treat- 
ment combination are represented by the dot with circle circumscribed. Notice that four 
experimental units are assigned to each price level, as shown by the dot plot along the price 
(X) axis, and six experimental units are assigned to each mode of advertising, as shown by 
the dot plot along the advertising (Y) axis. 

. As before, we use n for the number of units receiving a given treatment when all treatment 
sample sizes are the same. For the n — 2 communities that were assigned treatment 1, for 
instance, the product price was fixed at 55 cents and radio advertising was employed, and 
so on for the other communities in the study. 

This is an experimental study because control was exercised in assigning the factor A and 
factor B levels to the experimental units by means of random assignments of the treatments 
to the communities. The design used was a completely randomized design. 


Radio 


Advertising 


Newspaper 


Price 


814 PartFive Multi-Factor Studies 


Example 2 


Example 3 


An analyst studied the effects of family income (under $15,000. $1 5,000-$29,999, $30,000. 
$49,999, $50,000 and more) and stage in the life cycle of the family (stages 1, 2, 3 4) oi 
appliance purchases. Here, 4 x 4 = 16 treatments are defined. These are in part: 


Treatment Description 
1 Under $15,000 income, stage 1 
2 Under $15,000 income, stage 2 
16 $50,000 and more income, stage 4 


The analyst selected 20 families with the required income and life-cycle characteristics for 
each of the "treatment" classes for this study, yielding 320 families for the entire study, 

This study is an observational one because the data were obtained without assigning 
income and life-cycle stage to the families. Rather, the families were selected because they 
had the specified characteristics. 


A medical investigator studied the relationship between the response to three blood pressure 
lowering drug types for hypertensive males and females. Here, 3 x 2 — 6 treatments are 
defined. These are: 


Treatment Description 


1 Drug type 1, males 
Drug type 1, females 
Drug type 2, males 
Drug type 2, females 
Drug type 3, males 
Drug type 3, females 


ON л Боом 


The investigator selected 30 adult males and 30 adult females and randomly assigned 
10 males and 10 females to each of the three drug types, yielding 60 total subjects. 

This study has one observational factor, gender, and one experimental factor, drug type. 
This design is referred to as a randomized complete block design where the gender factor 
is called a block. This design will be discussed in Chapter 21. 


Comments 


1. When we considered single-factor studies, we did not place any restrictions on the nature of ther 
factor levels under study. Formally, the ab treatments in a two-factor investigation could be considered 
as the r factor levels in a single-factor investigation and analyzed according to the methods discussed 
in Part IV. The reason why new methods of analysis are required is that we wish to analyze the ab 
treatments in special ways that recognize two factors are involved and enable us to obtain information 
about the main effects of each of the two factors as well as about any special joint effects. 

2. When a completely randomized design is used in a multifactor study, the random assignments 
of treatments to the experimental units are made in the same manner as for a single-factor study. No 
new problems are encountered once the treatments are defined in terms of the factor levels of the 
various factors under study. Ld 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 815 


йе: One-Factor-at-a-Time (OFAAT) Approach to Experimentation 


FIGURE 19.2 
One-Factor- 
at-a-Time 
Approach— 
Example 1. 


It is not uncommon for investigators to vary only one factor at a time, holding all others 
constant, when attempting to understand the effect of a given set of factors on a particular 
outcome. For example, to maximize sales in Example 1, we might be tempted to first fix 
price at a particular value such as 60 cents, and then determine which mode of advertising 
(radio or newspaper) is most effective. If this test reveals that newspaper advertising leads 
to higher sales, we would then run a second test in which the advertising mode is fixed 
at “newspaper,” and the three price levels are tested. This one-factor-at-a-time (OFAAT) 
experimental approach is depicted in Figure 19.2. 
We note a number of deficiencies of the OFAAT approach: 


I. The OFAAT approach does not explore the entire space of treatment combinations, 
and important treatment combinations may therefore be missed. In Figure 19.2, we see 
that two treatment combinations—(radio, 55 cents) and (radio, 65 cents)—were omitted, or 
one-third of the total. The fraction of treatment combinations omitted can be much larger 
for studies involving larger numbers of factors and/or larger numbers of factor levels. 

2. Interactions cannot be estimated. As we have seen in regression, an interaction between 
two predictors is present if the effect (slope) of one predictor changes with the level of the 
other predictor. With the OFAAT approach, this is impossible to determine, because the 
slope of one factor is obtained only for a fixed set of levels of the other factors. 

3. A full randomization is not possible for the OFAAT approach, because the experi- 
ment must be fielded in stages. Thus if certain variables that are not under control of the 
experimenter change with the stages of the testing, the results may be adversely affected. 

4. 'The OFAAT approach is often more difficult to field logistically, because of the se- 
quence of stages. Ateach stage, the experimental apparatus is set up, responses are obtained, 
an analysis is carried out, and the next treatment combinations are determined. Setting up for 
each experimental phase can be difficult. For example, it may be necessary in an industrial 
experiment to reserve time on an assembly line or in a pilot plant well in advance. In a field 
study involving a survey, it may be necessary to preschedule subjects and interviewers. In 
addition, processing responses can be time-consuming—for example, if complicated labo- 
ratory analyses are required—and the subsequent phase of experimentation may be delayed 
significantly. 


Radio [e о 
£ Phase ! 
5 comparison 
Ф 
> 
© 
X Newspaper 


Phase II comparison 


55 60 65 [M 
Price 


816 PartFive Muhi-Factor Suulies 


Advantages of Crossed, Multi-Factor Designs 


Efficiency and Hidden Replication. Multi-factor studies are more efficient than the 
OFAAT experimental approach. Even though the OFAAT approach devotes all геѕошгсе to 
studying the effect of only one factor, it does not yield any more precise information about 
that factor than a multi-factor experiment of the same size. With reference to Example 1 
again, suppose that 12 communities were to be utilized in a traditional study, six assigned 
to radio advertising and the other six to newspaper advertising, and that the price would be 
kept constant at 60 cents. For this traditional study, the comparison between the two types 
of promotional campaigns would be based on two samples of six communities each. The 
same is true for the two-factor study in Example 1, since each promotional campaign occurs 
there in three treatments and each treatment has two communities assigned to it. Figure 19.1 
reveals what is sometimes called hidden replication in a two-factor experiment. While there 
are only two replicates for each treatment combination, each level of advertising is repeated 
six times, and each level of price is repeated four times. 

The increased efficiency due to hidden replication for main effect tests in multi-factor 
studies is only present when either unimportant interactions exist or when interaction e£ 
fects are small relative to main effects. When important interactions are present, multiple 
comparisons of the individual cell means rather than comparisons of the main effects are 
usually conducted. 


Assessment of Interactions. OFAAT studies provide no information about interactions. 
Specifically in our previous illustration, it does not provide any information about any 
special joint effects of price and promotional campaign. For instance. it might be that the 
price effects are not large when the promotional campaign is in newspapers but are large with 
radio advertising. Such interaction effects can be readily investigated from cross-classified 
multifactor studies. 


Validity of Findings. In addition to being more efficient and readily providing information 

~ about interaction effects. multi-factor studies also can strengthen the validity of the findings. 
Suppose that in Example 1, management was principally interested in investigating the 
effects of price on sales. If the promotional campaign used in the price study had been 
newspaper advertising, doubts might exist as to whether or not the price effects differ for 
other promotional vehicles. By including type of promotional campaign as another factor 
in the study, management can set information about the persistence of thé price effects with 
different promotional vehicles, without increasing the number of experimental units in the 
study. Thus, multifactor studies can include some factors of secondary importance to permit 
inferences about the primary factors with a greater range of validity. 


Comments 


1. Mulü-factor studies permit a ready evaluation of interaction effects for observational data and 
economize on the number of cases required for the analysis, just as for experimental studies. 

2. The advantages of multi-factor experiments just described should not lead one to think that 
inclusion of more factors necessarily results in a better study. Experiments involving many factors, 
each at numerous levels, become complex, costly, and time-consuming. It is often a better vesearch 
strategy to begin with fewer factors and/or fewer levels for each factor, and then extend the investigation 
in accordance with the results obtained to date. In this way, resources can be devoted principally © 
the most promising avenues of investigation. and a better understanding of the effects of the factors 
can be obtained. и 


+ Chapter 19  Two-Factor Studies with Equal Sample Sizes 817 


Meaning of ANOVA Model Elements 


Before presenting a formal statement of the analysis of variance model for two-factor 
studies, we shall develop the model elements and discuss their meaning. This will not only 
be helpful in understanding the ANOVA model but will also provide insights into how the 
analysis of two-factor studies should proceed. Throughout this section, we assume that all 
population means are known and are of equal importance when averages of these means 
e» are required. 


Н 


>, Шоѕітабоп 
ы* RE 


5 To illustrate the meaning of the ANOVA model elements, we consider a simple two-factor 
study in which the effects of gender and age on learning of a task are of interest. For sim- 
plicity, the age factor has been defined in terms of only three factor levels (young, middle, 
old), as shown in Table 19. 1a. 


Treatment Means 

А The mean response for a given treatment іп a two-factor study is denoted by uij, where 
i refers to the level of factor A (i = 1,..., a) and j refers to the level of factor В (j = 
1,..., b). Table 19.1a contains the true treatment means 4u; for the learning example. Note, 
for instance, that ш = 9, which indicates that the mean learning time for young males is 
9 minutes. Similarly, we see that u22 = 11, so that the mean learning time for middle-aged 
females is 11 minutes. 

The interpretation of a treatment mean Ju; depends on whether the study is observational, 
experimental, or a mixture of the two. In an observational study, the treatment mean gu; 
corresponds to the population mean for the elements having the characteristics of the ith 
level of factor A and the jth level of factor B. For instance, in the learning example, the 
treatment mean Hı; is the mean learning time for the population of young males. 

In an experimental study, the treatment mean ш, stands for the mean response that 
would be obtained if the treatment consisting of the ith level of factor A and the jth level 
of factor В were applied to all units in the population of experimental units about which 


TABLE 19.1 um See 
Age Effect but | (а) Mean Learning Times (in minae 
No Gender Factor B—Age 
Effect, with No T UY = T 
Interactio | j^1 j=2 j=3 Row 
Learning Factor. A— Gender Young Middle Old Average 
, Example. i=1 Male 9 (un) T1 Q2). 16 (илз) 12 ((л.) 
1 —2 Female 9 (ил) 11 (u22) 16 (u23) 12 (u.) 
Column average. 9 (п.л) 11 (4.2) 16 (1.3) 12 (џ..) 
(b) Main Gender Effects (in minutes) (© Main Age Effects (in minutes) 
Ол = ру. fs = 12-12 =0 bi =H- pb. = 9-12=-3 
à = H2. — fb» —12-12—0 bo = шә “pu. = 11-12 = —1 


Ёз = шз— ш.. = 16—12 = 4 


818 PartFive Multi-Factor Studies 


inferences are to be drawn. For instance, inastudy where factor A is type of training progran 
(highly structured, partially structured, unstructured) and factor B is time of training (duy; 
work, after work), бл employees are selected and n are assigned at random to each of thesi 
treatments. The mean и; , here represents the mean response, say, mean gain in productivity, 
if the ith training program administered during the jth time were given to all employees in 
the population of experimental units. 


Factor Level Means 


The treatment means in Table 19. La for the learning example indicate that the mean learni 
times for men and women are the same for each age group. On the other hand, the mean 
learning time increases with age for each gender. Thus, gender has no effect on mean 
learning time, but age does. This can also be seen quickly from the row averages ang 
column averages shown in Table 19.1a, which in this case tell the complete story. The roy 
averages are the gender factor level means, and the column averages are the age factor leye] 
means. We denote the column average for the first column by q.i, which is the average of 
Hi and шә. In general, the column average for the jth column is denoted by p. ;: 
3 Mii 
Hue = (19.1) 
and the row average for the ith row is denoted by u.: 
59 UH 
ш. = о (19.2) 


. The overall mean learning time for all ages and both genders is denoted by џ.., and is 
defined in the following equivalent fashions: 


= У; 25; ш; 


а 19.3a 
H i (19.3a) 
fin = 2; ur (19.3b) 
a 

i ш. = eus (19.30) 
In Table 19.1a, the gender factor level means are ш. = ил. = 12 for the two genders, 
the age factor level means are ш. = 9, 4.2 = 11, and и.з = 16 for the three age groups, 

and the overall mean learning time is u.. = 12 minutes. * 


Main Effects 


Main Age Effects. To summarize the main age effects, we shall consider the differences 
between each factor level mean and the overall mean. These differences are called main 
age effects. For instance, the main effect for young persons in Table 19.1a is the difference 
between и. the mean learning time for young persons, and џ.., the overall mean. This 
difference is denoted by f: 


В = ил = ш. =9—12=—3 


В. is called the main effect for factor В at the first level. This and the other main effects for 
factor B are shown in Table 19.1c. 


Chapter 19  Tiwo-Factor Studies with Equal Sample Sizes 819 


Main Gender Effects. The main gender effects are defined in corresponding fashion, and 
denoted by o;. For instance, we have: 


` Qi = Ш. — u.. = 12— 12 = 0 


à a, is called the main effect for factor A at the first level. The main effects for factor A 
are shown in Table 19.1b. They are both zero, indicating that gender does not affect mean 
learning time. 


General Definitions. In general, we define the main effect of factor A at the ith level as 


follows: 
Qi = Hi. — џ.. (19.4) 
x Similarly, the main effect of the jth level of factor B is defined: 
Bj = ш; — ш.. (19.5) 


It follows from (19.3b) and (19.3c) that: 


У`а=0 Y G0 (19.6) 
1 


i 


Thus, the sum of the main effects for each factor is zero. 

: Note again that a main effect indicates how much the factor level mean deviates from 
M the overall mean. The greater the main effect, the more the factor level mean differs from 
the overall mean response averaged over the factor levels for both factors. 


Additive Factor Effects 
The factor effects in Table 19.1 have an interesting property. Each mean response 44; can 
be obtained by adding the respective gender and age main effects to the overall mean p... 
For instance, we have: 


км = pe. Бо + В = 12+0+ (73) =9 
Uz = ш.. + 05 + Вз = 12:004 = 16 


In general, we have for Table 19.1a: 
шу = ш... Бо: + В; Additive factor effects (19.7) 


which can be expressed equivalently, using the definitions of œ; in (19.4) and of £; in (19.5), 
as: 


Hij = ш. + Ш.у — Ш. Additive factor effects (19.7a) 


It can also be shown that each treatment mean uj in Table 19.1a can be expressed in 
terms of three other treatment means: 


Hij = шр + Hj = Hry Additive factor effects i + i', j te ni (19.7b) 


820 PartFive Multi-Factor Studies 


FIGURE 19.3 
Age Effect but 
No Gender 
Effect, with No 
Interactions— 
Learning 
Example. 


Age 
^ 16 е—————————————%ә В, 
5 
С 
2, 
o 12 
Е е———————————% B; 
Е 
= в 
S 
S 
ý A A2 
Gender 


For instance, we have: 

Hu = Шо + Ha — Un = 11+9—11 =9 
or: 

Ши = Шз + Mat — Uz = 16+9—16=9 


When all treatment means сап be expressed in the form of (19.7), (19.72), or (19.7), 
we say that the factors do not interact, or that no factor interactions are present, or that 
the factor effects are additive. The significance of no factor interactions is that the effect 
of either factor does not depend on the level of the other factor. Consequently, the effects 
of the two factors can be described separately merely by analyzing the factor level means 
or the factor main effects. For instance, in the learning example in Table 19.1a, the two 
gender means signify that gender has no influence regardless of age, and the three age 
means portray tbe influence of age regardless of gender. The analysis of factor effects is 
therefore quite simple when there are no factor interactions. 


Graphic Presentation. Figure 19.3 presents the mean learning times of Table 19.1ain 
the form of a treatment means plot—also known as an interaction plot. The X axis contains 
the gender factor levels (denoted by A; and А»), and the Y axis contains learning time. 
Separate curves are drawn for each of the age factor levels (denoted by B,, B2, and Вз). The 
zero slope of each curve indicates that gender has no effect. The differences in the heights 
of the three curves show the age effects on learning time. 

The points on each curve are conventionally connected by straight lines even though 
the variable on the X axis (gender, in our example) is not a continuous variable. When the 
variable on the X axis is qualitative, the slopes of the curves have no meaning, except when 
the slope is zero, which implies there are no factor level effects. If one of the two factors 15 
a quantitative variable, it is ordinarily advisable to place that factor on the X scale. 

Note that the treatment means plot in Figure 19.3 corresponds to a conditional effects 
plot in regression, such as the ones shown in Figure 8.7 on page 307. In each case, the effect 
of one variable is shown at different levels of the other variable. 


A Second Example with Additive Factor Effects. Table 19.2a contains another illus- 
tration of factor effects that do not interact, for the same gender-age learning example a 
before. The situation here differs from that of Table 19.1a in that not only age but also 


„ HGURE 19.4 
"Age and 
vender Effects, 
with:No 
Interactions— 
«xample. 


Chapter 19  Tiwo-Factor Studies with Equal Sample Sizes 821 


(а) Mean Learning Times (in minutes) 


Factor В—Аде 


j=l ј= 2 ps3 Row 
Factor A—Gender Young Middle ‘Old Average 
i=1 Male 11 (un) 13.(ua2) 18 (аз) 14.) 
i=2 Female 7 (ua) 9 (4422). 14 (u23) 10 (u»-) 
Column average 9 (ua) 11. (2): 16 (u.3) 12 (и...) 
(b) Main Gender Effects (in minutes) (с) Main Age Effects (in minutes) 

d = ша. — u.. = 14-12 = 2 Ві = ил > б. = 9—12=—3 

æ = H3. — ш.. = 10 — 12 = —2 Во = 5 — ш.. = 11—12 = —1 


B3 = из ~u- =16—12= 4 


20 Сепаег 
2 16 
> 
x А 
E 12 
o 
Е 
E 8 
S 

0 

B B> B3 
Age 


gender affects the learning time. This is evident from the fact that the mean learning times 
for men and women are not the same for any age group. 

In Table 19.2a, as in Table 19.1a, every mean response can be decomposed according 
to (19.7): 


шу = ш. + о + В; 
For instance: 
Mu = ш.. 0; + fi = 124-24 (73) = 11 


Hence, the two factors do not interact, and the factor effects can be analyzed separately by 
examining the factor level means џ;. and p.j, respectively. 

Figure 19.4 presents the data from Table 19.2a in the form of a treatment means plot. 
This time we have placed age on the X axis and used different curves for each gender. Note 
that the difference in the heights of the two curves reflects the gender difference and the 
departure from horizontal for each of the curves reflects the age effect. Furthermore, the 
two curves are parallel, which indicates that no two-factor interactions are present. 


822: Part Five Multi-Factor Studies 


Equivalent Statements of Additive Factor Effects. We have said that two factors do pax 
interact if all treatment means 14; can be expressed according to (19.7), (19.7a), or (19 ү 
There are a number of other, equivalent, methods of recognizing when two factors do d 
interact. These are: ot 


1. The difference between the mean responses for any two levels of factor B is the same 
for all levels of factor A. (For instance, in Table 19.2a, going from young to middle age 
leads to an increase of two minutes for both males and females, and going from middle 
age to old leads to an increase of five minutes for both males and females.) Note that it is 
not required that the changes, say, between levels | and 2 and between levels 2 ang 3 of 
factor B are the same. These. of course, may differ depending upon the nature of the factor 
B effect. 

2. The difference between the mean responses for any two levels of factor A is the same 
for all levels of factor B. (For instance, in Table 19.2a. going from male to female leads to 
a decrease of four minutes for all three age groups.) 


3. The curves of the mean responses for the different levels of a factor are all parallel 
(such as the two gender curves in Figure 19.4). 


АП of these conditions are equivalent, implying that the two factors do not interact, 


Interacting Factor Effects 


TABLE 19.3 
Age and 
Gender Effects, 
with 
Interactions— 
Learning 
Example. 


Table 19.3a contains an illustration for the learning example where the factor effects do 
interact. The mean learning times for the different gender-age combinations in Table 19.3a 
indicate that gender has no effect on learning time for young persons but has a substantial 
effect for old persons. This differential influence of gender, which depends on the age of 
the person, implies that the age and gender factors interact in their effect on leaming time. 


(a) Mean Learning Times (in minutes) 


Factor B—Age 


Main 
j=1 j=2 j=3 Row Gender 
Factor A—Gender Young Middle Old Average Effect 
i=1 Male 9 (uu) 12 (илә) 18 (илз) 13 Q3 1 (ол) 
ji = 2 Female 9 (из) 10 (u22) 14 (u23) 11 (u2) —1 (o2) 
Column average 9 (u.a) 11 (u.2) 16 (u.a) 12 (џ..) 
Main age effect —3 (fi) —1 (82) 4 (f3) 


(b) Interactions (in minutes) 


Row 
j21 j=2 j=3 Average 
1= 1 -1 0 1 0 
і= 2 1 0 -1 0 


Column average 0 0 0 0 


Chapter 19 Two-Factor Studies with Equal Sample Sizes 823 


Definition of Interaction. We can study the existence of interacting factor effects formally 
by examining whether or not all treatment means u;; can be expressed according to (19.7): 


шу = He d о + В; 
If they can, the factor effects are additive; otherwise, the factor effects are interacting. 
For the learning example in Table 19.3a, the main factor effects œ; and 8; are shown in 
the margins of the table. It is clear that the factors interact. For instance, иу = 9 while: 


Bed ол + £y = 12+1 + (—3) = 10 


If the two factors were additive, these would be the same. 

The difference between the treatment mean џ;; and the value u.. + o; + В; that would 
be expected if the two factors were additive is called the interaction effect, or more simply 
the interaction, of the ith level of factor A with the jth level of factor B, and is denoted by 
(08):;. Thus, we define (03):; as follows: 


(#8) = ш; — (и. + ot; + Bj) (19.8) 


Replacing o; and f; by their definitions in (19.4) and (19.5), respectively, we obtain an 
alternative definition: 
(o); = ш; — ш. — Weg + Me (19.8a) 

To repeat, the interaction of the ith level of A with the jth level of B, denoted by (o/f);;. is 
simply the difference between the treatment mean иу; and the value that would be expected 
if the factors were additive. If in fact the two factors are additive, all interactions equal zero; 
i.e., (е8); = 0. 

The interactions for the learning example in Table 19.3a are shown in Table 19.3b. We 
have, for instance: 


(08) 1з = шз — (He + ол + Вз) 
= 18—(12++1+Е4) 
=1 


Recognition of Interactions. We may recognize whether or not interactions are present 
in one of the following equivalent fashions: 


1. By examining whether all u;; can be expressed as the sums u.. + о; + fj. 

2. By examining whether the difference between the mean responses for any two levels 
of factor B is the same for all levels of factor A. (For instance, note in Table 19.3a that 
the mean learning time increases when going from young to middle-aged persons by three 
minutes for men but only by one minute for women.) 

3. By examining whether the difference between the mean responses for any two levels 
of factor A is the same for all levels of factor B. (For instance, note in Table 19.3a that 
there is no difference between genders for young persons, but there is a difference of four 
minutes for old persons.) 

4. By examining whether the treatment means curves for the different factor levels in 
a treatment means plot are parallel. (Figure 19.5 presents a plot of the treatment means 
in Table 19.3a, with age on the X axis. Note that the treatment means curves for the two 
genders are not parallel.) { 


824 PartFive Multi-Factor Studies 


FIGURE 19.5 
Age and 
Gender Effects, 
with Important 
Interactions— 
Learning 
Example. 


Gender 

g 16 
2, А, 
Ё 
po 
D 
Е 
с 
w 8 
s 

0 LL L6 REX gae: 

В, В, Вз 


Comments 


1. Note from Table 19.3b that some interactions are zero even though the two factors are interacting. 
All interactions must equal zero in order for the two factors to be additive. 


2. Table 19.3b illustrates that interactions sum to zero when added over either rows or columns: 


Уа05=0 ј=1,...,Ь (19.9a) 
у (обуу =0 i=1,...,a (19.9b) 
i 


Consequently, the sum of all interactions is also zero: 


Уу. У `0); =0 (19.9¢) 
ДШИ 


We show this for (19.92): 


J ob) = Y ur — n — 0% B) 


ist 
== Уш — ац. — Уо — afj 


Now $^, шу = ap; by (19.1) апаў о; = 0 by (19.6). Finally, B; = p.j — È.. by (19.5). Hence, we 
obtain: 


У (обу = ар. —ap.. —а@(и.,— и.) = 0 a 


Important and Unimportant Interactions 


When two factors interact, the question arises whether the factor level means are still 
meaningful measures. In Table 19.3a, for instance, it may well be argued that the gender 
factor level means 13 and 11 are misleading measures. They indicate that some difference 
exists in learning time for men and women, but that this difference is not too great. These 
factor level means hide the fact that there is no difference in mean learning time between 


Chapter 19  Tiwo-Factor Studies with Equal Sample Sizes 825 


Factor B—Age - 

б. ` j= ERA Я Кому 
Factor A—Gender Young 'Middle Old: Average 
i-1 Male +, 9.75 12:00 17.25 13.00 

.i22 Female 8:25 10.00 14:75: 11:00 
Column average 9.00 11:00 16.00 1200 


20 


Gender 

9 А 
2 16 
2, Аз 
w 
E 12 
Е 
о 
с 
= 
s 8 
S 

0 

B В, Вз 


genders for young persons, but there is a relatively large difference for old persons. The 
interactions in Table 19.3a would therefore be considered important interactions, implying 
that one should not ordinarily examine the effects of each factor separately in terms of the 
factor level means. A treatment means plot, such as in Figure 19.5, presents effectively a 
description of the nature of the interacting effects of the two factors. 

Sometimes when two factors interact, the interaction effects are so small that they are 
considered to be unimportant interactions. Table 19.4 and Figure 19.6 present such a case. 
Note from Figure 19.6 that the curves are almost parallel. For practical purposes, one may 
say that the mean learning time for women is two minutes less than that for men, and this 
statement is approximately true for all age groups. Similarly, statements based on average 
learning time for different age groups will hold approximately for both genders. 

Thus, in the case of unimportant interactions, the analysis of factor effects can proceed 
as for the case of no interactions. Each factor can be studied separately, based on the factor 
level means 4. and џи. ;, respectively. This separate analysis of factor effects is, of course, 
much simpler than a joint analysis for the two factors based on the treatment means 4;;, 
which is required when the interactions are important. 


Comments 


1. The determination of whether interactions are importantorunimportant is admittedly sometimes 
difficult because it depends on the context of the application, just as the determination of whether 
an effect in a single-factor study is important. The subject area specialist (researcher) needs to play 
a prominent role in deciding whether an interaction is important or unimportarit? The advantage of 


826 PartFive Multi-Factor Studies 


vn 


unimportant (or no) interactions, namely. that one is then able to analyze the factor effects ве 5 
is especially great when the study contains more than two factors. Paratel 


2. Occasionally, it is meaningful to consider the effects of each factor in terms of the fac 
sneans even when important interactions are present. For example, two methods of teaching college 
mathematics (abstract and standard) were used in teaching students of excellent, good, and mo denne 
quantitative ability. Important interactions between teaching method and student's quantitative a 
were found to be present. Students with excellent quantitative ability tended to pesform equally ne 
with the two teaching methods, whereas students of moderate or good quantitative ability tended "i 
perform better when taught by the standard method. If equal numbers of students with moderate, 
good, and excellent quantitative ability are to be taught by one of the two teaching methods, then E 
method that produces the best average result for all students might be of interest even in the presence. 
of important interactions. A comparison of the teaching method factor level means would then ie. 
relevant, even though important interactions are present. и: 


юге, 


Transformable and Nontransformable Interactions 


When important interactions exist, they are sometimes the result of the scale on which the 
response variable is measured. Consider, for instance, factor effects that act multiplicatively, 
rather than additively as in (19.7): 


шу = B0; В; Multiplicative factor effects (19.10) 


If we were to assume here that the factor effects are additive, we would find that condition 
(19.7) does not hold and therefore that interactions are present. These interactions can be 
removed, however, by applying a logarithmic transformation to (19.10): 


log uj; = log u.. + logo; + log B; (19.11) 


This result can be restated equivalently as follows: 


ш; = ш. +о + B; (19.11a) 
where: 
Hi; = log Hij 
и. = logun.. 
a; = logo; 
B; = log Bj " 


The result in (19.11a) suggests that the original measurement scale for the response 
variable Y may not be the most appropriate one in the sense of leading to easily understood 
results. Rather, use of Y' = log Y forthe response variable may be better, making the additive 
model (19.7) then more appropriate. 

We say that the interactions present when the factor effects are actually multiplicative 
are transformable interactions because a simple transformation of Y will remove most of 
these interaction effects and thus make them unimportant. 

Another instance of transformable interactions occurs when each interaction effect equals 
the product of functions of the main effects, for example: 


ш; = 0; + B; + 24/0; y В; Multiplicative interactions (19.12) 


Chapter 19  7wo-Factor Studies with Equal Sample Sizes 827 


-9:5 © (byTreatifiefit;Mearis.áfter 
^on ue a $i "Root Transfórmation 
e` able TT ET E b TT 


An equivalent form of (19.12) is: 


шу = (vei 5) (19.122) 
| If we now apply the square root transformation, we obtain an additive effects model: 
з ul; = ol + Bi (19.13) 
` where: 
i шу = уш} 
а = уй 


Some simple transformations that may be helpful in making important interactions un- 
important are the square, square root, logarithmic, and reciprocal transformations. When in- 
teractions cannot be largely removed by a transformation, they are called nontransformable 
i interactions. 

Table 19.5a contains an example of important interactions that are transformable. When 
a square root transformation is applied to these means, the resulting treatment means in 
Table 19.5b show no interacting effects. Ordinarily, of course, one cannot hope that a simple 
transformation of scale removes all interactions as in Table 19.5, but only that interactions 
become unimportant after the transformation. 


Interpretation of Interactions 
The interpretation of interactions can be quite difficult when the interacting effects are 
complex. There are many occasions, however, when the interactions have a simple structure, 
such as in Table 19.3a, so that the joint factor effects can be described in a straightforward 
manner. Table 19.6 provides several additional illustrations. The corresponding treatment 
means plots are shown in Figure 19.7. 

In Table 19.6a and Figure 19.7a, we have a situation where either raising the pay or 
increasing the authority of low-paid executives with small authority leads to increased 
productivity. However, combining both higher pay and greater authority does not lead 
to any substantial further improvement in productivity than increasing either one alone. 
Table 19.6b and Figure 19.7b represent a case where both higher pay and-greater authority 
are required before any substantial increase in productivity takes place. 


828 PartFive Multi-Factor Studies 


TABLE 19.6 PINE А 

Examples of (a) Productivity of кадыы 

Different Types Factor B—Authority 

ef Interactions: Factor A—Pay Small Great 
Low 50 72 
High 74 75 


_ (b) Productivity of Executives 


^" - Factor B—Authority 
Factor A—Pay Small Great 
Low 50 52 
High 53 75 


(c) Productivity of Executives — 
Factor B—Authority 


Factor A—Pay Small Great 
Low 50 72 
High 72 50 


(d) Productivity per Person in Crew _ 
Factor B— Personality 


of Crew Chief 
Factor A—Crew Size Extrovert Introvert 
4 persons 28 20 
6 persons 22 20 
8 persons 20 19 
: 10 persons 17 18 


It is possible that two factors interact, yet the main effects for one (or both) factors are 
zero. This would be the result of interactions in opposite directions that balance out over 
one (or both) factors. Thus, there would be definite factor effects, but these would not be 
disclosed by the factor level means. Table 19.6c and Figure 19.7c represent this situation 
where neither factor effect is present and the two factors interact. The case of interacting 
factors with no main effects for one (or both) factors fortunately is unusual. Typically. 
interaction effects are smaller than main effects. 

Table 19.6d and Figure 19.7d portray a situation where size of crew and personality of 
crew chief interact in a complex fashion. Productivity with an extrovert crew chief and а 
crew of four is substantially larger than with an introvert crew chief. The advantage becomes 
small with crews of six and eight, and with a crew of 10 an introvert crew chief leads to а 
slightly larger productivity. 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 829 


(a) Productivity of Executives (b) Productivity of Executives 
- Authority T Authority 
B5 (Great) Bz (Great) 
g 70 Ву (Small) g 70 
E 2 
© © 
5 5 60 
Е Е B4 (Small) 
Qa a 


0 
A, (Low) Ay (High) A, (Low) A; (High) 
Pay Pay 
(с) Productivity of Executives (d) Productivity Per Person in Crew 
80 Authority 30 B, (Extrovert) 
а 
B4 (Small ` 
> 70 HORA > 25 М. 
z 2 NN 
E 60 E 20 — e _, 
£ e B; (Introvert) SQ 
50 © 415 
Bo (Great) 
Ay (Low) A» (High) 
Pay Crew Size (А) 
Comment 


The terminology of reinforcement and interference interactions described in Chapter 8 for regression 
models where both predictor variables are quantitative is applicable to analysis ОЁ variance models 
if the two factors are quantitative ог can be ordered on a measurement scale. In Figures 19.7a and 
19.7b, pay level and authority both can be ordered on a scale. Hence, the interaction in Figure 19.7a 
can be described as an interference or antagonistic interaction (the slope decreases for higher levels 
of factor B), while that in Figure 19.7b can be described as a reinforcement or synergistic interaction 
(the slope increases for higher levels of factor B). 

Similarly, the terminology of ordinal and disordinal interactions described in Chapter 8 for re- 
gression models where one predictor variable is quantitative and the other qualitative is applicable to 
analysis of variance models if one factor is quantitative or can be ordered on a measurement scale 
and the other factor is qualitative. In Figure 19.7d, crew size is a quantitative factor and personality is 
a qualitative factor. Therefore, the interaction in Figure 19.7d can be described as disordinal because 
the treatment means curves intersect. ш 


19.3 Model I (Fixed Factor Levels) for Two-Factor Studies 


Having explained the model elements, we are now ready to develop ANOVA model I with 
fixed factor levels for two-factor studies when all treatment sample sizes are equal and all 
treatment means are of equal importance. This ANOVA model is applicable to observational 


830 Part Five Multi-Factor Sudies 


studies and to experimental studies based on a completely randomized design. Ip Part үү 
we shall consider ANOVA models for some other experimental designs. 

The basic situation is as follows: Factor A is studied at a levels, and these are of intrinsic 
interest in themselves; in other words, the a levels are not considered to be a sample from 
a larger population of factor A levels. Similarly, factor B is studied at b levels that are of 
intrinsic interest in themselves. All ab factor level combinations are included jn the study 
The number of cases for each of the ab treatments is the same, denoted by п, and it is 
required that n > 1. Thus, the total number of cases for the study is: 


nr — abn (19.14) 


The kth observation (k = I, .... n) for the treatment, where A is at the ith level, and p 
is at the jth level, is denoted by У; (i = 1....,a; j = 1,..., b). Table 19.7 on page 833 
illustrates this notation for an example where A is at three levels, B is at two levels, and 
two replications have been made for each treatment. 

We shall state the fixed ANOVA model for two-factor studies in two equivalent versions— 
the cell means version and the factor effects version—and later will use one or the other as 
convenience dictates. 


Cell Means Model 


Model Formulation. When we regard the ab treatments without explicitly considering 
the factorial structure of the study, we express the analysis of variance model in terms of 
the cell (treatment) means 4;: 


Үк = Mij t Eijk (19.15) 
where: 
Hij are parameters 
г are independent N (0, o?) 
~ Dll; жу ka д п 


Important Features of Model. Some important features of the cell means model аге: 


1. The parameter ju;; is the mean response for the treatment in which factor A is at the 
ith level and factor B is at the jth level. This follows because E{e;;,} 4 О: 


ElYjk) = ui; (19.16) 
2. Since uj; is a constant, the variance of Y;;, is: 
ЧҮК} = о {вк} = o? (19.17) 


3. Since the error terms & are independent and normally distributed, so are the obser- 
vations У; к. Hence, we can state ANOVA model (19.15) also as follows: 


Ун are independent N (;;, с?) (19.18) 


4. ANOVA model (19.15) is a linear model because it can be expressed in the form 
Y = Xf + e. Consider a two-factor study with each factor having two levels (i.e..« = P = 2) 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 831 


< and two trials for each treatment (1.е., и = 2). Then Y, X, B, апа є are defined as follows: 


Үп 1000 €1n 
Ул? 1000 £112 
Yi 0100 Mu E121 
Yi22 0100 H2 £122 
i ia You AE 10:073 0 B= Hat E= [en (12:12) 
; Yi» 0010 H2 £212 
E Yon 000 1 E221 
М Yo 000 1 £222 


Recall that the E{Y} vector, which consists of the elements E{Y;;jz}, equals Xf) according 
to (6.20). This vector here is: 


1000 nu 
1000 Mu 
"i 0100 Mi My 
E 0100]|un Шл2 
Е) == |o o 1 of [|= |2 (19.20) 
Г 00 1 Oj [и Lar 
0001 Ha 
0 0 0 1 M22 


Thus, E{Y;;.} = Hij, as it must according to (19.16), and we have the proper matrix 
representation for the two-factor ANOVA model (19.15): 


Yin [m ёш 
Үр Ши 812 
Үэ My £pi 
| Yi2 [27 £122 
Y= Үн | 7 ХВ + є = М + £n (19.21) 
You Mai £212 
Yi u2 £221 
Yn» M22 £222 


In view of the error terms being independent with constant variance o?, ће variance- 
covariance matrix of the error terms is o? [e] = oI, as in (16.9) for the single-factor ANOVA 
model. Also as before, we have o?(Y) = о?{є} for two-factor ANOVA model (19.15). 

5. ANOVA model (19.15) is therefore similar to the single-factor ANOVA model (16.2), 
except for the two subscripts now needed to identify the treatment. Normality, independent 
error terms, and constant variances for the error terms are properties of the ANOVA models 
for both single-factor and two-factor studies. 


Factor Effects Model 


Model Formulation. An equivalent version of cell means model (19.15) can be obtained 
by replacing each treatment mean u;; with an identical expression in terms of factor effects 
based on the definition of an interaction in (19.8): z 


(ap); = Hij — (и.. + о + В;) 


832 PartFive Multi-Factor Studies 


Rearranging terms, we obtain the identity: 


шу = ш.. +0; + В; + (8) (19.22) 
where: 
- У 23 Hij 
id ab 
a; === Ші. — H.. 
Bj = p.j h 


(B) = шу — ш. — Hj +H- 
This formulation indicates that each cell mean ju; can be viewed as the sum of four com. 
ponent factor effects. Specifically, (19.22) states that the mean response for the treatment 
where factor A is at the ith level and factor B is at the jth level is the sum of: 
. An overall mean p... 
. The main effect о; for factor A at the ith level. 
. The main effect B; for factor В at the jth level. 


. The interaction effect (06):; when factor A is at the ith level and factor B is at the 
jth level. 


> шо һо м 


Replacing u;; in ANOVA model (19.15) by the expression in (19.22), we obtain an 
equivalent factor effects ANOVA model for two-factor studies: 
Үк = ш. +05 + бу + (08): + Eijk (19.23) 
where: 


u- is a constant 

о; are constants subject to the restriction У ^o; = 0 
A В; are constants subject to the restriction Ў ^B; = 0 

(оВ):; are constants subject to the restrictions: 


(p); = 0 j=l,....b 
25; (0B);; = 0 i —l,...,a 
єк are independent N (0, o?) 
i= l,...,a;j=1,...,b:k=1,...,n 
Important Features of Model. Some important features of the factor effects model are: 


1. ANOVA model (19.23) corresponds to the fixed factor effects ANOVA model (16.62) 
for a single-factor study except that the single-factor treatment effect is here replaced by 
the sum of a factor A effect, a factor B effect, and an interaction effect. 


2. The properties of the observations Y; for factor effects model (19.23) are the same 
as those for the equivalent cell means model (19.15). Since E f{e;;z} = 0, we have: 


EAY ijn} = u.. + o; + B; + @B)i = ш; (19.24) 
The second equality follows from identity (19.22). Further, we have: 
ЧҮ} = o? (19.25) 


ar 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 833 


because the errór term & is the only random term on the right-hand side in (19.23) and 
c?(e;j.) = о?. Finally, the Y;;, are independent normal random variables because the error 
terms are independent normal random variables. Hence, we can also state ANOVA model 
(19.23) as follows: 


Үк are independent N[u.. + o; + В; + (e/f)iy, o?] (19.26) 


3. ANOVA model (19.23) is a linear model because it can be stated in the form 
Y = Xf + e. We shall show this explicitly in Section 23.2. 


494 Analysis of Variance 


allustration 


TABLE 19.7 
Sample Data 
and Notation 
for Two-Factor 
Study— Castle 
Bakery 
Example (sales 
in cases). 


Table 19.7 contains an illustration that we shall employ in this chapter and the next. The 
Castle Bakery Company supplies wrapped Italian bread to a large number of supermarkets 
in a metropolitan area. An experimental study was made of the effects of height of the 
shelf display (factor A: bottom, middle, top) and the width of the shelf display (factor B: 
regular, wide) on sales of this bakery’s bread during the experimental period (Y, measured 
in cases). Twelve supermarkets, similar in terms of sales volume and clientele, were utilized 
in the study. The six treatments were assigned at random to two stores each according to 
a completely randomized design, and the display of the bread in each store followed the 
treatment specifications for that store. Sales of the bread were recorded, and these results 
are presented in Table 19.7. 


D Factor B (display width) 


Factor A Display 
(displayheight) „ 7. s Row Helght 
i = Ву (regular) B; (wide) Total Average 
Ai (bottom) 1 46 Qa) 
| 40 (02), 
Total 90 (Yu) 86 (Y2-) 176 (У...) 
Average 45 (Yn.) 43 (V12.) 44 (V...) 
A» (middle) : 67 (Үл ) 
7E 052): = 
Total 130 (Yar.) |. 138 (Y22.) 268 (Y>..) 
Average 65 (Yar) ‚69 (Y;.) 67 (Ү›..). 
Аз (top) |а 0) 42% | 
139 5:2) 46 (¥ 22)" 
Total 80:(¥37:) 88 (Yi) 168 (Ys..) 
Average 40 (Ya) 44 (Ys2.) 42 (Ys..) 
Column total 300 Xi.) > 312:(Y.2-) 612 (Ya) 


Display width average 50 (Ya) 52 (¥2.) 51.(Y..) 


834 PartFive Multi-Factor Studies 


Notation 


Table 19.7 illustrates the notation we shall use for two-factor studies. It is a straightforward 
extension of the notation for single-factor studies. An observation is denoted by Vix. The 
subscripts i and j specify the levels of factors A and B, respectively, and the subscript k 
refers to the given case or trial for a particular treatment (1.e., factor level combination), 

A dot in the subscript indicates aggregation or averaging over the variable represented 
by the index. For instance, the sum of the observations for the treatment corresponding to 
the ith level of factor A and the jth level of factor B is: 


Y;.— SS Yi (19.272) 
k=l 


The corresponding mean is: 


Yj. = 


(19.27b) 


= |x 


The total of all observations for the ith factor level of A is: 


b n 
Y. =X Y Yin (19.270) 
j k 
and the corresponding mean is: 
= Y;.. 
Y;.. = — (19.274) 
bn 
Similarly, for the jth factor level of B the sum of all observations and their mean are 


denoted by: 


. ү. = Y Yir (19.276) 
i k 
Y,- Mee (19.27f) 


Finally, the sum of all observations in the study is: 


a b n 
Ү.= УУ у Yu (19.279) 
i j k 
and the overall mean is: 
Ү. = — (19.27h) 


Fitting of ANOVA Model 


Cell Means Model (19.15). Fitting the two-factor cell means model (19.15) to the sample 
data by either the method of least squares or the method of maximum likelihood leads to 


minimizing the criterion: 
О=ў 33 Qn - uy (19.28) 
i j k 


ES Chapter 19  Two-Factor Studies with Equal Sample Sizes 835 
When we perform the minimization of Q, we obtain the least squares and maximum like- 
lihood estimators: 
pay = fy. (19.29) 
Thus, the fitted values are the estimated treatment means: 
an = Yi. (19.30) 
The residuals, as usual, are defined as the difference between the observed and fitted values: 
eijk = Yi — Yin = Yi — Yi. (19.31) 
Residuals are highly useful for assessing the appropriateness of two-factor ANOVA model 
(19.15), as they also are for the statistical models considered earlier. 
Factor Effects Model (19.23). For the equivalent factor effects model (19.23), the least 
squares and maximum likelihood methods both lead to minimizing the criterion: 


Q- 05 2. Уи ~ n. — a — Bj — (В) (19.32) 
ruere 


subject to the restrictions: 
У5а=0 У)68=0 У005=0 Уу =0 
i j i 1 


When we perform this minimization, we obtain the following least squares and maximum 
likelihood estimators of the parameters: 


Parameter Estimator 
u- й.. =Y... (19.33a) 
oj = ш. — M.. 6; = Yj. — Y... (19.33b) 
Bi =H; = ш. ĝ;= Y; Sy (19.33с) 
(08); = шу — ш. — Maj + u- (a); Yi. — Y. — Y + Y. (19.33d) 


The correspondences of these estimators to the definitions of the parameters are readily 
apparent. 

The fitted values and residuals for factor effects model (19.23) are exactly the same as 
those for cell means model (19.15). Specifically, the fitted values for ANOVA model (19.23) 
are: 

Y, = X.. + (Yj. — Y...) + (Ёу.— Ў..) + (Йу. — Yi — Yj + Y) = ЮЙ. (1934) 


so that the residuals are again: 


eijk = Yi — Ys. (19.35) 


836 PartFive Multi-Factor Studies 


FIGURE 19.8 
Estimated 
Treatment 
Means 
Plot—Castle 
Bakery 
Example. 


Example 


70 


S Pi Wide display B; 


“ә 


Sales (in Cases) 
л 
о 


23 


Regular display Ву 


w 
e 


А С A Аз 
Display Height 


For the Castle Bakery example, the fitted values, i.e., the estimated treatment means Y. are 
shown in Table 19.7. A plot of these estimated treatment means is presented in Figure 19.8. 
We see from this estimated treatment means plot that, for both display widths, mean sales 
for the middle display height are substantially larger than those for the other two display 
heights. The effect of display width does not appear to be large. Indeed, there may be no 
effect of display width; the variations between the estimated treatment means for any given 
display height may be solely of a random nature. In that event, there would be no interactions 
between display height and display width in their effects on sales. 

Figure 19.8 differs from the earlier treatment means plots because the earlier figures 
presented the true treatment means џи;;, while Figure 19.8 presents sample estimates. We 
therefore need to test whether or not the effects shown in Figure 19.8 are real effects or 
represent only random variations. 'To conduct these tests, we require a partitioning of the 
total sum of squares, to be discussed next. 


Partitioning of Total Sum of Squares 


Partitioning of Total Deviation. We shall partition the total deviation of an observation 
Y;j, from the overall mean Y... in two stages. First, we shall obtain a decomposition of the 
total deviation Y;;ų — Y.. by viewing the study as consisting of ab treatments: 


Ук -= T. => Yi;. m Ү.. + Yi ZI Y;. 
— _——.— — 
Total Deviation of estimated Deviation ( 19. 36) 
deviation treatment mean around around estimated 
overall mean treatment mean 


Note that the deviation around the estimated treatment mean is simply the residual ер in 
(19.35): 


Eijk = Yi — Yi. 


Treatment and Error Sums of Squares. When we square (19.36) and sum over all cases, 
the cross-product term drops out and we obtain: 


SSTO — SSTR -- SSE (19.37) 


p 


Example _ 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 837 


where: 


SSTO = YYY On- XY (19.37a) 
i j k 


SSTR = пу V X. - Y. (19.37b) 
i j 


SSE = YY Y Or - Y; = у, у, ber (19.37c) 
EUS E i J E 


SSTR reflects the variability between the ab estimated treatment means and is the ordinary 
treatment sum of squares, and SSE reflects the variability within treatments and is the usual 
error sum of squares. 'The only difference between these formulas and those for the single- 
factor case is the use of the two subscripts i and j to designate a treatment. 


For the Castle Bakery example, the decomposition of the total sum of squares in (19.37) is 
obtained as follows, using the data in Table 19.7: 


SSTO = (47 — 51)? + (43 — 51)” + (46 — 51)? + - - - + (46 — 51)? = 1,642 
SSTR = 2[(45 — 51)? + (43 — 51)? + (65 — 51)? + - - - + (44 — 51)?] = 1,580 
SSE = (47 — 45)" + (43 — 45)? + (46 — 43)? +--- + (46 — 44)? = 62 
Partitioning of ‘Treatment Sum of Squares. Next, we shall decompose the estimated 


treatment mean deviation Y;;. — Y... in terms of components reflecting the factor A main 
effect, the factor B main effect, and the AB interaction effect: 


Y. — Y. = Y.. — Y.. + Yj. — Y.. + Уу. — ЮЙ. — Yj + ¥.. 
—— —— n Oooo ,—————— 
Deviation of A main B main AB interaction (1 9. 38) 
estimated treatment effect effect effect 
mean around 
overall mean 


When we square (19.38) and sum over all treatments and over the п cases associated with 
each estimated treatment mean Y;;., all cross-product terms drop out and we obtain: 


SSTR = SSA + SSB + SSAB (19.39) 
where: 
SSA = nb X (Y... — Y. (19.39a) 
SSB = na X (Ёу.—Ў..)” (19.39b) 
i 
SSAB =пў ^ (y. — Yi. — X + YP (19.39c) 
i j 


The interaction sum of squares can also be obtained as a remainder: 


SSAB = SSTO — SSE — SSA — SSB (19.39d) 


838 PpartFive Multi-Factor Studies 


Example 


Example 


or from: 
SSAB — SSTR — SSA — SSB (19.399) 


where SSTO and SSTR are given in (19.372) and (19.37b), respectively. 

SSA, called the factor A sum of squares, measures the variability of the estimated factor 
A level means Y;... The more variable they are, the bigger will be SSA. Similarly, SSB ; Called 
2 factor B sum of squares, measures the variability of the estimated factor B level means 

j.. Finally, SSAB, called the AB interaction sum of squares, measures the var lability of the 
d interactions Уу. — Y;.. — Ку. + Y... for the ab treatments. Since the mean of ај 
estimated interactions is Zero, the deviations of the estimated interactions around their mean 
is not explicitly shown, as it was in SSA and SSB. The larger absolutely are the estimated 
interactions, the larger will be SSAB. 

The partitioning of SSTR into the components SSA, SSB, and SSAB is called an orthogonal 
decomposition. An orthogonal decomposition is one where the component sums of squares 
add to the total sum of squares (SSTR here), and likewise for the degrees of freedom. Thus, 
the decompositions of SSTO into SSTR and SSE for single-factor and two-factor studies are 
also orthogonal decompositions. While many different orthogonal decompositions of SSTR 
are possible here, the one into the SSA, SSB, and SSAB components is of interest because 
these three components provide information about the factor A main effects, the factor B 
main effects, and the AB interactions, respectively, as will be seen shortly. 


For the Castle Bakery example, we obtain the following decomposition of SSTR, using the 
data in Table 19.7 and the formulas in (19.39): 
SSA = 2(2)[(44 — 51)? + (67 — 51)? + (42 — 51)2] = 1,544 
SSB = 2(3)[(50 — 51)? + (52 — 51] = 12 
SSAB = 1,580 — 1,544 — 12 = 24 


Hence, we have: 


1,580 = 1,544 + 12 + 24 
SSTR = SSA + SSB + SSAB 


Combined Partitioning. Combining the decompositions in (19.37) and (19.39), we have 
established that: 


SSTO = SSA + SSB + SSAB + SSE (19.40) 
where the component sums of squares are defined in (19.37) and (19.39). 
For the Castle Bakery example, we have found: 


1,642 = 1,544 + 12 +24 + 62 
SSTO = SSA + SSB + SSAB + SSE 


Thus, much of the total variability in this instance is associated with the factor A (display 
height) effects. 


„ре 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 839 
E: 


partitioning of Degrees of Freedom 


Ue 


& 


Example 


We are familiar from single-factor analysis of variance with how the degrees of freedom 
are divided between the treatment and error components. For two-factor studies with n 
cases for each treatment, there are a total of пт = nab cases and r = ab treatments; hence, 
the degrees of freedom associated with SSTO, SSTR, and SSE are nab — 1, ab — 1, and 
nab — ab = (n — 1)ab, respectively. These degrees of freedom for the Castle Bakery 
example are 2(3)(2) — 1 = 11, 3(2) — 1 = 5, and (2 — 1)(3)(2) = 6, respectively. 
Corresponding to the further partitioning of the treatment sum of squares in (19.39), 
we can also obtain a breakdown of the associated ab — 1 degrees of freedom. SSA has 
а — 1 degrees of freedom associated with it. There are a factor level deviations Y;.. — Y.., 
but one degree of freedom is lost because the deviations are subject to one restriction, i.e., 
YXX.. — Y.) = 0. Similarly, SSB has b — 1 degrees of freedom associated with it. The 
degrees of freedom associated with SSAB, the interaction sum of squares, is the remainder: 


(ab — 1) — (a— 1) - (5— 1) = (a — D(b — 1) 


The degrees of freedom associated with SSAB may be understood as follows: There are 
ab interaction terms. These are subject to b restrictions since: 


2 Ë. — -Ej +.)=0 ј=1,...,Ь 


There are а additional restrictions since: 
j 
However, only a — 1 of these latter restrictions are independent since the last one is implied 


by the previous b restrictions. Altogether, therefore, there are b + (a — 1) independent 
restrictions. Hence, the degrees of freedom are: 


— (b-a— 1) = (a — D)(b — 1) 


For the Castle Bakery example, SSA has 3 — 1 — 2 degrees of freedom associated with it, 
SSB has 2— 1 = 1 degree of freedom, and SSAB has (3 — 1)(2— 1) = 2 degrees of freedom. 


Mean Squares 


Mean squares are obtained in the usual way by dividing the sums of squares by their 
associated degrees of freedom. We thus obtain: 


SA 

msa = SA (19.41a) 
а—1 
SB 

Mss c8. (19.41b) 
b—1 

SSAB : 
MSAB n (19.41c) 


(a - (b 1) 


840 PartFive Müulti-Factor Studies 


Example For the Castle Bakery example, these mean squares are: 
1,544 
MSA = —— = 772 
2 
2 
МАВ = > = 12 
MSAB = “ = 12 


Expected Mean Squares 


It can be shown, along the same lines used for single-factor ANOVA, that the mean squares 


for two-factor ANOVA model (19.23) have the following expectations: 
E{MSE} = o? 


5 ЕЕ 2 
E{MSA} = о? tnb = о? + т: 
a— а— 


ха 2 » (и. тоз CH э? 


E(MSB) = о? tna =o-+na = 
>) Lab); 
— ij 
E{MSAB) = 0? en T 
mo? + nhi — ee — eg + e Y 
eov (a — Db — 1) 


(19.422) 


(19.42b) 


(19.420) 


(19.420) 


These expectations show that if there are no factor A main effects (1.е., if all ш. аге 


equal, or all o; = 0), MSA and MSE have the same expectation; otherwise MSA tends to 


be larger than MSE. Similarly, if there are no factor B main effects, MSB and MSE have 
the same expectation; otherwise MSB tends to be larger than MSE. Finally, if there are no 
interactions [i.e., if all (v);; = 0] so that the factor effects are additive, MSAB has the 
same expectation as MSE; otherwise, MSAB tends to be larger than MSE. This suggests that 
F* test statistics based on the ratios M$A/MSE, MSB/MSE, and MSAB/MSE will provide 
information about the main effects and interactions of the two factors, with large values 
of the test statistics indicating the presence of factor effects. We shall see shortly that tests 


based on these statistics are regular F tests. 


Analysis of Variance Table 


The decomposition of the total sum of squares in (19.40) into the several factor and error 
components is shown in Table 19.8. Also shown there are the associated degrees of freedom, 
the mean squares, and the expected mean squares. Table 19.9 contains the two-factor analysis 


of variance for the Castle Bakery example. 


Figure 19.9 presents MINITAB output for the Castle Bakery example. The first output 
block shows ANOVA results similar to those presented in Table 19.9. The second block 


presents various estimated means. 


Lys 


TABLE 19.8 ANOVA Table for Two-Factor Study with Fixed Factor Levels. 


Source of 5 | ў ж E v e Ue t Si Я Ы е" H 
Variation ` 5$ та “MS? t= ELMS} 
„з азы . Py hon i SAD 
Factor A SSA = nbY XY;.. — Ү..)? a-1 MSA = I. cir bor uÙ 
zy Less ^ 2 
Factor B SSB = nay XY. = Ӯ.) b-1 MSB = LL c? on ) 
TIME 2 vio v.v SSAB : (и — ui. hp T Y 
AB interactions 55А8 = NS у XY. — Y. + Yjet Y)? (a—-10(b-1) МАВ = G- 15-7 02+ a oo 
. ; 7 SSE 
Error SSE = УУУУ = Ү.)? ab(n = 1) MSE = ab(n—1) c? 
"Total . STO = Y Y, Yi — Y. nab — 1 


842 :PpartFive, Multi-Factor Studies 


TABLE 19.9 
ANOVA Table 
for Two-Factor 
Study—Castle 
Bakery 
Example. 


FIGURE 19.9 
MINITAB 
Computer 
Output for 
Two-Factor 
Analysis of 
Variance— 
Castle Bakery 
Example. 


Source of Variation 55 df MS 
Factor A (display height) 1,544 2 772 
Factor B (display width) 12: 1 42 
AB interactions — 24 2 12 
Error 62 6 10.3 
Total 1,642 11 
Analysis of Variance for Cases Sold 
Source = DF $$ MS F P 
Height 2 1544.00 772.00 74.71 0.000 
Width 1 12.00 12.00 1.16 0.323 
Height*Width 2 24.00 12.00 1.16 0.375 
Error 6 62.00 10.33 
Total 11 1642.00 
Means 
Height N Cases So 
1 4 44.000 
2 4 67.000 
3 4 42.000 
Width N Cases So 
1 6 50.000 
2 6 52.000 
Height Width N Cases So 
1 1 2 45.000 
1 2 2 43.000 
2 1 2 65.000 
2 2 2 69.000 
3 1 2 40.000 
3 2 2 44.000 


19.5 Evaluation of Appropriateness of ANOVA Model 


Before undertaking formal inference procedures, we need to evaluate the appropriateness of 
two-factor ANOVA model (19.23). No new problems arise here, The residuals in (19.35): 


eijk = Үр — Yi. 


are examined for normality, constancy of error variance, and independence of error terms 
in the same fashion as for a single-factor study. 

Weighted least squares is a standard remedial measure when the error terms are normally 
distributed but do not have constant variance. When both the assumptions of normality and 
constancy of the error variance are violated, a transformation of the response variable may be 
sought to stabilize the error variance and to bring the distribution of the error terms closer to 
anormal distribution. Our discussion of these topics in Chapter 18 for single-factor AN OVA 
applies completely to two-factor ANOVA. 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 843 


FIGURE 19.10 MINITAB Diagnostic Residual Plots—Castle Bakery Example. 


kesiauat 


4 


Example 


(a) Residual Plot (b) Normal Probability Plot 


Residual 


Exp. Value 


Our earlier discussion on the effects of departures from the single-factor ANOVA model 
applies fully to two-factor ANOVA. In particular, the employment of equal sample sizes for 
each treatment minimizes the effect of unequal error variances. 


In the Castle Bakery example, there are only two replications for each treatment. Also, the 
data are rounded to keep the illustrative computations simple. As a result, the analysis of 
residuals will only be of limited value here. The residuals are obtained according to (19.35). 
Using the data in Table 19.7, we have, for instance: 


еш = 47 — 45 = 2 
eni = 46 — 43 = 3 


A plot of the residuals against the fitted values an = Үзу. is presented in Figure 19.10a. 
There is no strong evidence of unequal error variances for the different treatments here. A 
normal probability plot of the residuals is presented in Figure 19.10b. The plot is moderately 
linear; the fact that only six plot points are visible is due to the rounded nature of the data. 
The coefficient of correlation between the ordered residuals and their expected values under 
normality is .966, which tends to support the reasonableness of approximate normality. 

On the basis of these diagnostics and since the inference procedures for ANOVA model 
(19.23) are robust, it appears to be reasonable to proceed with tests for factor effects and 
other inference procedures. 


19.6 F Tests 


In view of the additivity of sums of squares and degrees of freedom, Cochran’s theorem 
(2.61) applies when no factor effects are present. Hence, the F* test statistics based on the 
appropriate mean squares then follow the F distribution, leading to the usua] type of F tests 
for factor effects. | 


844 PartFive Multi-Factor Studies 


Test for Interactions 
Ordinarily, the analys is of a two-factor study begins with a test to determine whether or not 
the two factors interact: 
Ho: hij — ui. — hj t a =O for all i, j 
Ha: uj — шь — pj +p FO for some i, j (19.43) 
or equivalently: 


Ho: all (aB);; =0 


Ha: not all (@B);, equal zero (19.43a) 


As we noted from an examination of the expected mean squares in Table 19.8, the appropriate 
test statistic is: 


* 


_ МАВ 
“ек (19.44) 


Large values of F* indicate the existence of interactions. When H holds, F* is distributed 
as F[(a — 1)(b— 1), (n — 1)ab]. Hence, the appropriate decision rule to control the 'Туре I 
error at o is: 

If F* < ЕП – о; (а — 1)(b — 1), (n — 1)ab], conclude Ho 1845 
If F* > F[1—o; (a — DY(b — 1), (n — ар], conclude Ha (12.45) 


where ЕП — o; (a — 1)(b — 1), (n — 1)ab] is the (1 — &)100 percentile of the appropriate 
F distribution. 
Test for Factor A Main Effects 


Tests for factor A main effects and for factor В main effects ordinarily follow the test for 
interactions when no important interactions exist. To test whether or not A main effects are 


present: 
Ho: ш. = ua. =: = ца. 
0. M1 H2 H (19.46) 
Ha: not all и;. are equal 
or equivalently: 
Ну: о = 05 =`- = = ? 
esed ded (19.462) 
H,: not all o; equal zero 
we use the test statistic: 
ксл (19.47) 
MSE 


Again, large values of F* indicate the existence of factor A main effects. Since F : is 
distributed as F[a— 1, (n—1)ab] when Hy holds, the appropriate decision rule for controlling 
the risk of making a Type I error at o is: 


If F* < F[1—o;a — 1, (n — Da], conclude Ho 


(19.48) 
If F* > ЕП —a;a— 1, (п — рар], conclude Н, 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 845 


тея for Factor B Main Effects 


Example 


This test is similar to the one for factor A main effects. The alternatives are: 


Н, ы „== „э mL a 
0: H-1 = H-2 H-b (19.49) 
Ha: not all џ. ; are equal 
or equivalently: 
Ho: = =... = = 0 
сй p (19.492) 
Ha: not all В; equal zero 
The test statistic is: 
MSB 
F* = —— 19.50 
MSE ( ) 
and the appropriate decision rule for controlling the risk of a Type I error at o is: 
If F* < F[1—o; b — 1, (n — 1)ар], conclude Ho 
(19.51) 


If F* > Е – а; Б — 1, (n — D)ab], conclude H, 


We shall investigate in the Castle Bakery example the presence of display height and display 
width effects, using a level of significance of a = .05 for each test. First, we begin by testing 
whether or not interaction effects are present: 


Ho: all (8) =0 
Ha: not all (o/f);; equal zero 


Using the ANOVA results from Table 19.9 in test statistic (19.44), we obtain: 


Е* = 12 = 1.17 
71037 


For a = .05, we require F(.95; 2, 6) = 5.14, so that the decision rule is: 


If F* < 5.14, conclude Ho 
If F* > 5.14, conclude H, 


Since F* = 1.17 < 5.14, we conclude Но, that display height and display width do not 
interact in their effects on sales. The P-value of this test is P(F(2, 6) > 1.17} = .37. 
Since the two factors do not interact, we turn to test for display height (factor A) main 

effects; the alternative conclusions are given in (19.46). Test statistic (19.47) for ourexample 
becomes: 

772 

Е* = —_ = 75.0 

10.3 
For o = .05, we require F(.95; 2, 6) = 5.14. Since F* = 75.0 > 5.14, we conclude Ha, 
that the factor A level means и. are not equal, or that some definite effects gssociated with 
height of display level exist. The P-value of this test is P {F (2, 6) > 75.0} = .0001. 


846 PartFive Mnulii-Factor Studies 


Next, we test for display width (factor B) main effects; the alternative Conclusions an 
given in (19.49). Test statistic (19.50) becomes for our example: e 
12 


——— = |] 
10.3 d 


For a = .05, we require F(.95; 1, 6) = 5.99. Since F* = 1.17 < 5.99, we conclude Н, 
that all џ. ; are equal, or that display width has no effect on sales. The P-value of this ies 
is P{F(1, 6) > 1.17} = .32. 

Thus, the analysis of variance tests confirm the impressions from the estimated treatment 
means plot in Figure 19.8 that only display height has an effect on sales for the treatments 
studied. At this point, it is clearly desirable to conduct further analyses of the nature of 
the display height effects. We shall discuss analyses of the nature of the factor effects jn 
Sections 19.8 and 19.9. 


cx 


Kimball Inequality 
If the test for interactions is conducted with level of significance o, that for factor A 
main effects with level of significance o», and that for factor B main effects with level of 
significance o^, the level of significance o for the family of three tests is greater than the 
individual levels of significance. From the Bonferroni inequality in (4.4), we can derive the 
inequality: 
a X о Ha: + оз (19.52) 


For the case considered here, a somewhat tighter inequality can be used, the Kimball in- 
equality, which utilizes the fact that the numerators of the three test statistics are independent 
and the denominator is the same in each case. This inequality states: 


o x1—(1—2o)(1—o(l —0o3) (19.53) 


For the Castle Bakery example, where o, = o» = оз = .05, the Bonferroni inequality 
yields as the bound for the family level of significance: 


ы а < .05 + .05 + .05 = .15 
while the Kimball inequality yields the bound: 
а < 1 — (.95)(.95)(.95) = .143 


This illustration makes it clear that the level of significance for the family of three tests may 
be substantially higher than the levels of significance for the individual tests. 


Comment 


The F* test statistics in (19.44), (19.47), and (19.50) can be obtained by the general linear test approach 
explained in Chapter 2. For example. in testing for the presence of interaction effects, the alternatives 
are those given in (19.43) and the full model is ANOVA model (19.23): 


Yin = и. + о + B; + (08); + £e Full model (19.54) 


Fitting this full model leads to the fitted values Fix = ү,,. and the error sum of squares: 


SSE) = УУ Уа Ра = УУУ O 07 = SSE (19.55) 


Е 


boa e 


N 
$ 


DET LM 


19.7 Strategy for Analysis 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 847 


which is the usual ANOVA error sum of squares in (19.37c). This error sum of squares has ab(n — 1) 
degrees of freedom associated with it. 
The reduced model under Не: (8); = 0 is: 


Yi = p.. +0; + В} + Eijk Reduced model (19.56) 


It can be shown that the fitted values for the reduced model are Ê; = У. + Y j. — Y.., so that the 
error sum of squares for the reduced model is: 


SER = 3 M Vo fy у У,у FF + (19.57) 


This error sum of squares can be shown to have nab — a — b 4- 1 degrees of freedom associated with 
it. Test statistic (2.70) then simplifies to F* — MSAB/MSE in (19.44). m 


Scientific inquiry is often guided by the principle that the simplest explanations of observed 
phenomena tend to be the most effective. Data analysis is guided by this principle, seeking 
to obtain a simple, clear explanation of the data. In the context of ANOVA studies, additive 
effects provide a much simpler explanation of factor effects than do interacting effects. The 
presence of interacting effects complicates the explanation of the factor effects because they 
must then be described in terms of the combined effects of the two factors. Of course, some 
phenomena are complex so that the factor effects cannot be described simply by additive 
effects. The desire for a simple, parsimonious explanation, when possible, suggests the 
following basic strategy for analyzing factor effects in two-factor studies: 


]. Examine whether the two factors interact. 

2. Ifthey donotinteract, examine whether the main effects for factors A and В are important. 
For important A or B main effects, describe the nature of these effects in terms of the 
factor level means £u. ог p.j, respectively. In some special cases, there may also be 
interest in the treatment means pij. 

. If the factors do interact, examine if the interactions are important or unimportant. 

. If the interactions are unimportant, proceed as in step 2. 

. If the interactions are important, consider whether they can be made unimportant by a 
meaningful simple transformation of scale. If so, make the transformation and proceed 
as in step 2. 

6. For important interactions that cannot be made unimportant by a simple transformation, 

analyze the two factor effects jointly in terms of the treatment means и;у. In some special 
cases, there may also be interest in the factor level means џ;. and p. ;. 


л ш 


A flowchart of this strategy is presented in Figure 19.11. 

We have already discussed the testing for interaction effects, the possible diminution of 
important interactions by a meaningful simple transformation, as well as how to test for the 
presence of factor main effects. Now we turn to steps 2 and 6 of the strategy for analysis, 
namely, how to compare factor level means u;. or u.; when there are no interactions or 
only unimportant ones, and how to compare treatment means 44; when there are important 
interactions. We begin with a discussion of the analysis of factor effects when the factors 
do not interact or interact only in an unimportant fashion. r 


848 PartFive Multi-Factor Studies 


FIGURE 19.11 
Strategy for 
Analysis of 
Two-Factor 
Studies. 


Are 

interaction 
effects Р 

present? AA 


Are 
interaction 
effects 


Try simple 
transformation 


Are 


Use factor level 


К i Are К ; 
mere to с main effects eaa | 
actor effects | important? А ә p 
separately | P important? „е 


Use treatment 
means to examine 
factor effects jointl 


19.8 Analysis of Factor Effects when Factors Do Not Interact 


As just noted, the analysis of factor effects usually only involves the factor level mea 
and д.; when the two factors do not interact, or when they interact only in an unimp 
fashion. 
Estimation of Factor Level Mean 
Unbiased point estimators of u;. and џ. ; аге: 
fi. = Ү,.. 
Y 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 849 


«Ф 


where Y;.. and Y ;. аге definéd in (19.270) апа (19.27), respectively. The variance of Y;.. is: 
2 


о.) = — (19.582) 
bn 


since Y: contains bn independent observations, each with variance o?. Similarly, we have: 


о? 


о.) = — (19.586) 
ап 
Unbiased estimators of these variances are obtained by replacing o? with MSE: 
= MSE 
Ay.) = — (19.59a) 
bn 
= MSE 
2.) = — (19.59b) 
an 


Confidence limits for и. and j.; utilize, as usual, the ¢ distribution: 
Y... + t[1 — 0/2; (n — Dab]s(Y,..) (19.602) 
Y;. + t[1 — 0/2; (n — lab]s(Y;.) (19.60b) 


The degrees of freedom (n — 1)ab are those associated with MSE. 


Estimation of Contrast of Factor Level Means 
A contrast among the factor level means jy;.: 


L= am. — where 3^6 =0 (19.61) 


is estimated unbiasedly by: 


Because of the independence of the Y;.., the variance of this estimator is: 
2 
2581 Д 22ү Y. О 2 
010) = У 80405.) = i » (19.63) 
An unbiased estimator of this variance is: 
2 MSE 
27у — 2 
5 {L} = bn 556 (19.64) 
Finally, the appropriate 1 — о confidence limits for L аге: 
E X t[1— 0/2; (n — Dab]s(£) (19.65) 
To estimate a contrast among the factor level means p. ;: 
L= J cju; where Ус; =0 (19.66) 
we use the estimator: "E 


LS ois (19.67) 


850 PartFive Multi-Factor Studies 


whose estimated variance is: 


{= ап 2,6 (19.68) 


The 1 — о confidence limits for L in (19.65) are still appropriate, with Ê and s(£) now 
defined in (19.67) and (19.68), respectively. 


Estimation of Linear Combination of Factor Level Means 
A linear combination of the factor level means i.: 


L= Уаш. (19.69) 


is estimated unbiasedly by Ê іп (19.62). The variance of this estimator is given іп (19.63), and 
an unbiased estimator of this variance is given in (19.64). The appropriate 1 — o confidence 
limits for L are given in (19.65). 

Analogous results follow for a linear combination of the factor level means y. ;: 


Le ср.) (19.70) 


Multiple Pairwise Comparisons of Factor Level Means 


Usually, more than one pairwise comparison is of interest, and the multiple comparison 
procedures discussed in Chapter 17 for single-factor ANOVA studies can be employed 
with only minor modifications for two-factor studies. If all or a large number of pairwise 
comparisons among the factor level means и. or џи. ; are to be made, the Tukey procedure of 
Section 17.5 is appropriate. When only a few pairwise comparisons are to be made that are 
specified in advance of the analysis, the Bonferroni procedure of Section 17.7 may be best. 
Often, tests for differences between pairs of factor level means precede the construction 
of interval estimates so that the analysis of the interval estimates can be confined to active 
comparisons. Finally, when a large number of comparisons among the factor-level means 
is of interest, the Scheffé method is usually preferred. 


Tukey Procedure. The Tukey multiple comparison confidence ljmits for all pairwise 
comparisons: 


D = y. — up. (19.71) 


with family confidence coefficient of at least 1 — œ are: 


D+Ts{D} (19.72) 
where: 
b = Y. — Y. (19.720) 
sp = AEE (19.720) 
Бп 
T ENT — оа, (n — lab] (19.720) 


J/2 


A Chapter 19  Two-Factor Studies with Equal Sample Sizes 851 


v. 


То use the Tukey procedure to conduct all simultaneous tests of the form: 


Ho: D = uj. — ш. = 0 


(19.73) 
Ha: D = ш. — ur. #0 
the test statistic and decision rule are: 
D 
* – 22, If [41 > g[1 — a; a, (n — 1)ab], conclude Н, (19.73a) 


For conciseness in this chapter, we state only the portion of the decision rule leading to 
conclusion H,. As for single-factor ANOVA, the family level of significance for all pairwise 
tests here is 1 — о; in other words, the probability of concluding that there exist any pairwise 
differences when there are none is o. 

For pairwise comparisons of the factor level means џи. ;, the only changes are: 


D= pj — by (19.74) 
D=Y¥,.  Yy. (19.75) 
A 2M5. 
apa TE (19.76) 
an 
T= 1—a;b, (п – 1)ab} 19.7 
EL (n — 1) (19.77) 
2 P ; 
а* = e If |g*| > g[1 — o; Б, (n — 1)ab], conclude H, (19.78) 
s 


Bonferroni Procedure. When only a few pairwise comparisons specified in advance are 
to be made, the Bonferroni method may be best. The simultaneous estimation formulas 
above still apply, with the Tukey multiple T replaced by the Bonferroni multiple B: 


B = t[1 —a/2g; (n — 1)аЬ] (19.79) 


where g is the number of statements in the family. 
To test simultaneously each of g pairwise differences with the Bonferroni procedure, the 
test statistic and decision rule are: 


A^ 


r= вр If |7"| > t£[1 — 0/26; (n — 1)ab], conclude H, (19.80) 
5 
Combined Factor A and Factor B Family. When important factor А and factor В 
effects both are present, it is often desired to have a family confidence coefficient 1 — a, or 
family significance level a, for the joint set of pairwise comparisons involving both factor 
A and factor B means. The Bonferroni method can be used directly for this purpose, with 
g representing the total number of statements in the joint set. 

Alternatively, the Bonferroni method can be used in conjunction with the Tukey method. 
To illustrate this use, if the pairwise comparisons for factor A are made with the Tukey 
procedure with a family confidence coefficient of .95, and likewise for the pairwise com- 
parisons for factor B, the Bonferroni inequality then assures us that the family confidence 
coefficient for the joint set of comparisons for both factors is at least .90. 


852 PartFive Multi-Factor Studies 


Multiple Contrasts of Factor Level Means 


Scheffé Procedure. When a large number of contrasts among the factor level mean 
Or u.; are of interest, the Scheffé method should be used. If the contrasts involve the lii. hi 
in (19.61), the Scheffé confidence limits are: 


L+Ss{L} (19.81) 
where: 


2 


$^ = (a — )yF[1 —o;a — 1, (n — bab] (19.812) 


and £ is given by (19.62) and s?{L} is given by (19.64). The probability is then 1 — o that 
every confidence interval (19.81) in the family of all possible contrasts is correct, If the 
contrasts involve the u., as in (19.66), L is given by (19.67), s^(L] is given by (19.68), and 
the Scheffé multiple in (19.81) is defined by: 


S? = (b — )ЕП — a; b — 1, (n — lyab] (19.81b) 
When the Scheffé procedure is employed to conduct simultaneous tests of the form: 
Ho: L=0 
(19.82) 
H: L #0 
for contrasts involving the factor level means ju;., the test statistic and decision rule are: 
E 
Е = =; If F* > ЕП —o;a— 1, (n — Dab], conclude Н, (19.822) 


(a — 1)s?{L} 
When the contrasts involve the factor level means џи. ;, the test statistic and decision rule 
are: 

"E 
` BS If F* > F[l —o; b — 1, (n — Пар), conclude Н, (19.820) 

(b — Ds" {L} 
Bonferroni Procedure. When the number of contrasts of interest is small and has been 
specified in advance, the Bonferroni procedure may be best. Confidence limits (19.81) are 
modified by replacing the Scheffé multiple S with the Bonferroni multiple B: 


B — t[1 — 0/22; (n — Dab] (19.83) 


where g is the number of statements in the family. 
Simultaneous testing of g tests with the Bonferroni procedure is based on the following 
test statistic and decision rule: 


s{L} 
Combined Factor A and Factor B Family. When important factor A and factor B effects 
are present and contrasts for each of the two factors are of interest, it is often desired that 


the inference procedure provide assurance for the combined family of factor A and factor В 
contrasts. Several possibilities exist to accomplish this: 


If [е > {1 0/26; (n — ар], conclude H, (19.84) 


1. The Bonferroni method may be used directly, with g representing the total number of 
statements in the joint set. 


Chapter 19 Two-Factor Studies with Equal Sample Sizes 853 


2. The Bonferroni methodcan be used to join the two sets of Scheffé multiple comparison 
; families in the same way explained earlier for joining two Tukey sets. 
3. The Scheffé confidence limits (19.81) can be modified to use the S multiple defined by: 


E S? = (a+b — 2)F[1 — o5 a+b — 2, (n — lab] (19.85) 


For simultaneous testing, the test statistics and decision rules in (19.822) and (19.82b) 
can be replaced by: 
f 
Я Е*=————; If F* > ЕП —g&; a+b —2, (n — Dab], conclude H, 
(a+b — 2) (£) 
(19.86) 


Estimates Based on Treatment Means 

Occasionally in analyzing the factor effects in a two-factor study when no interactions 
are present, there is interest in particular treatment means 445. For example, in a two- 
factor study of the effects of price and type of advertisement on sales, interest may exist in 
estimating the mean sales for two different price levels when a particular advertisement is 
used. In such cases, the methods of analysis for single-factor studies discussed in Chapter 
17 are appropriate. The number of treatments now is simply r — ab, the degrees of freedom 
associated with MSE are nz —r = пар — ab = (n — l)ab, and the estimated treatment means 
are Y;;., based on n observations each. 


Example 1—Pairwise Comparisons of Factor Level Means 
In the Castle Bakery, the estimated treatment means plot in Figure 19.8 suggested that no 
interaction effects are present and that display width may not have any effect. The formal 
analysis of variance based on Table 19.9 supported both of these conclusions. Our interest 
now is in examining the nature of the display height effects in more detail. 

First, we shall obtain a preliminary view of the display height and width effects by plotting 
bar graphs of the estimated factor level means in Table 19.7. Figure 19.12a contains a bar 
graph of the estimated factor A level means Y;... For comparison, we show in Figure 19.12b 
a similar plot for the estimated factor B level means Y... Figure 19.12a suggests that level 2 
of factor A (middle shelf display height) leads to significantly larger sales than the other 


FIGURE 19.12 (a) Factor A (Display Height) (b) Factor B (Display Width) 
Bar Graphs of 
Estimated 70+ _ 70 
Factor Level 60 i 1 60 
Means—Castle 50 E о 50 gs ema 
Bakery ^40 =F | к 3 2 40 “Оў З І 
Example. g 3 3 p. g s 8$ 
$30 1 E E d © 30 EK v.g 
о MES a 4 Eo o о 2 Li. t 
20 | | RO | 20 8] A 
10 | |] 1 | 10 c d 
Baai E "S €— i 
0 [oe same iad Tout ОЕР 0 ; Dis s А $ 
1 2 3 1 2 


854 PartFive Multi-Factor Studies 


TABLE 19.10 
Pairwise 
Testing of 
Factor A Level 
Means—Castle 
Bakery 
Example. 


(1) (2) (3) (4) 
Test Statistic Decision Rule 


Alternatives (19.73a) Conclude H, if |q*| > Condusion 
Ho: Dy = u2 — ш. =0 g= 2 = 14.33  q(95; 3, 6) = 4.34 н, 
Hg: Dy = рә. — u1. #0 
Ho: D2 = m. — u3 =0 ф = A =1.25 q(.95; 3, 6) = 4.34 Ho 
Hg: D2 = рл. — из. #0 
Ho: Ds = u2 из. =0 @° ve) =15.58 — q(.95; 3, 6) = 4.34 Ha 


Ha: D3 = uz. — из. #0 


two factor levels. In addition, Figure 19.12a also suggests that the mean sales for display 
height levels 1 and 3 may not be different from each other. 

Turning now to formal inference procedures, we shall first test simultaneously all pairwise 
differences among the shelf height means, using the Tukey multiple comparison procedure 
with family significance level о = .05. The alternatives to be tested for the comparisons 
of display height means (i = 1—bottom, 2—middle, 3—top) are shown in Table 19.10, 
column 1. From Tables 19.7 and 19.9 we obtain the following information: 


D, = №. — Yi.. = 67 — 44 = 23 MSE = 10.3 
a=3 

Do = Y.. — F.. =44—42= 2 b=2 
n=2 

Dy = Y.. — Ya. = 67-42 =25 (п—1)аЬ=6 

Hence, by (19.72b) we obtain: 
" Sik 2(10.3 
S" (Di] = s? (D4) = st 3} ( ) = 5.15 


so that {0} = s{D2} = s{D3} = 2.27. The test statistics and decision rules based on 
(19.73a) are given in Table 19.10, columns 2 and 3, and the conclusions from the tests are 
shown in column 4. Ё 

It can be concluded from the tests in Table 19.10 with family significance level о = .05 
that for the product studied and the types of stores in the experiment, the middle shelf 
height is far better than either the bottom or the top heights, and that the latter two do not 
differ significantly in sales effectiveness. All of these conclusions are covered by the family 
significance level of .05. 

Next, we wish to estimate how much greater are mean sales at the middle shelf height 
than at either of the other two shelf heights. We shall continue to use the Tukey multiple 
comparison procedure because the two pairwise comparisons now of interest are the result 
of the earlier testing of all pairwise comparisons. From our previous work, we have: 


Db, = i2. — Yı.. = 23 D, = £2. — Yz.. = 25 s{Di} = s{D3} = 2.27 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 855 


We also require, from (19.72): 
q(.95; 3, 6) = 4.34 


№ „2 
Ts(Di) = Ts{D3} = 3.07(2.27) = 7.0 


We therefore find the following confidence intervals for the two pairwise comparisons 
of the shelf height factor level means: 


16 = 23 — 7.0 < u2 — ш. < 23 +7.0 = 30 
18 = 25 — 7.0 < но. — из. < 25 + 7.0 = 32 


With family confidence coefficient of .95, we conclude that mean sales for the middle shelf 
height exceed those for the bottom shelf height by between 16 and 30 cases and those for 
the top shelf height by between 18 and 32 cases. 

We can summarize the effects of shelf height on mean sales by the following line plot: 


Top Bottom Middle 
shelf shelf shelf 


N | 
+ M г y a‘ 
40 с 50 60 


Cases Sold 


Example 2—Estimation of Treatment Means 


The manager of a supermarket that has sales volume and clientele similar to the supermarkets 
included in the Castle Bakery study has room only for the regular shelf display width, and 
wishes to obtain estimates of mean sales for the middle and top shelf heights. We shall 
now obtain interval estimates with a 90 percent family confidence coefficient using the 
Bonferroni procedure. 

From Tables 19.7 and 19.9, we have: 


Ya. — 65 Yg.—40 MSE=103 


Hence, we obtain: 


Б. - MSE 103 
sh} = 54001.) = —— = um 5.15 
n 


s(Ya.) = s(Ya.) = 2.27 


For g = 2, we require В = t[1 — 0/22; (n — 1)аЬ] = t(.975; 6) = 2.447. Thus, we obtain 
the confidence limits: 


65 + 2.447 (2.27) 40 + 2.447 (2.27) 
and the desired confidence intervals are; 


59.4 < ил x70.6 344 € uai < 45.6 


D 
x 


856 PartFive Multi-Factor Studies 


19.9 Analysis of Factor Effects when Interactions Are Important 


When important interactions exist that cannot be made unimportant by a simple transfor 
mation, the analysis of factor effects generally must be based on the treatment meang Ш. 
Typically, this analysis will involve estimation of multiple comparisons of treatment means 
or single degree of freedom tests. Furthermore, one often compares the levels of one factor 
across levels of the other factor, referred to as the comparison of simple effects. For example, 
in a 2 x 3 factorial structure study, we compare individual cell means within levels of each 
factor, e.£., uii = ро = шз and pay = изэ = роз and/or ши = fai, Mi = H22, and 

Каз = 23. 


Multiple Pairwise Comparisons of Treatment Means 
If pairs of treatment means u;; аге to be compared, either the Tukey or the Bonferroni 
multiple comparison procedure may be used, depending on which is more advantageous, 
In effect, the analysis is equivalent to that for single-factor ANOVA, with the total number 
of treatments here equal to r — ab, the degrees of freedom associated with MSE here equal 
tony; —r = (п — 1)ab, and each estimated treatment mean, now denoted by Y;;., based on 
n cases. 


Tukey Procedure. The Tukey 1 — o multiple comparison confidence limits for all pair- 
wise comparisons: 


Р = шшр LIFTS (19.87) 
аге: 
D + Ts(D] (19.88) 
where: 
; b -E.-Yy. (19.882) 
sÔ} = M (19.88b) 


T= — a; ab, (n — 1)аЬ] (19.88c) 


1 
wid 


The test statistic and decision rule for all simultaneous Tukey tests of the form: 


Hy: D=0 
5 (19.89) 
H: D #0 
are as follows when the family significance level is controlled at o: 
2D 
4 = VOD i If |g*| > gil — o; ab, (n — 1)ab], conclude Н, — (19.892) 


si D} 


Bonferroni Procedure. If the Bonferroni method is employed for a family of g compar 
isons, the multiple T in confidence interval (19.88) is replaced by: 


B —t[1— a/2g; (n — lab] (19.90) 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 857 


and the test statistic and decision rule in (19.892) become: 


r= =e If [2*| > ¢[1 —a/2g; (п — 1)ab], conclude Н, (19.91) 


Multiple Contrasts of Treatment Means 

* Scheffé Procedure. The Scheffé multiple comparison procedure for single-factor studies 
is directly applicable to the estimation of contrasts involving the treatment means ju; j. The 
joint confidence limits for contrasts of the form: 


1 = So cpu; where X X ci 50 (19.92) 


are: 


4 


E x Sst£) (19.93) 


where: 


123277 (19.932) 
ye SE 2339977 (19.93b) 


= a — DF[1—a;ab-— 1, (n — 1)аЬ] (19.93c) 


The test statistic and associated decision rule for all simultaneous Scheffé tests of the 
a form: 


Ho: L= 0 
(19.94) 
Ha L #0 
are as follows when the family significance level is controlled at œ: 
B 
F*L-——————; If F* > F[1—a; ab—1, (n—1)ab], conclude H, (19.94a) 


(ab — Ds?(£)* 


Bonferroni Procedure. When the number of contrasts is small, the Bonferroni procedure 
may be preferable. The confidence intervals (19.93) are simply modified by replacing S 
with B as defined in (19.90). The test statistic and decision rule in (19.94a) are replaced 
by: 


: If |£*| > t[1 — a/2g; (п — 1)ab], conclude H, (19.95) 


Example 1—Pairwise Comparisons of Treatment Means 
A junior college system studied the effects of teaching method (factor А) and student's 
quantitative ability (factor B) on learning of college mathematics. Two teaching methods 
were studied—the standard method of teaching (to be called the standard method) and a 
method that emphasizes teaching of concepts in the abstract before going into drill routines 


858 PartFive Multi-Factor Studies 


TABLE 19.11 
Results— 
Mathematics 
Learning 
Example. 


(a) Mean Learning Scores (n — 21) 


Teaching TM x А 
Method Quantitative Ability (j) 

i Excellent Good Moderate 
Abstract 92 (Y) 81 (Yi2.) 73 az.) 
Standard 90 (Ya.) 86 (Y22.) 82 (¥23.) 

(b) ANOVA Table 
Source of Variation SS df MS 
Factor A (teaching methods) 504 1 504 
Factor B (quantitative ability) 3,843 2 1,921.5 
AB interactions 651 2 325.5 
Error 3,360 120 28 
Total 8,358 125 


(to be called the abstract method). The quantitative ability of a student was determined bya 
standard aptitude test, on the basis of which the student was classified as having excellent, 
good, or moderate quantitative ability. Thus, factor A (teaching method) has a = 2 levels, 
and factor B (student’s quantitative ability) has b = 3 levels. 

For each quantitative ability group, 42 students were selected and randomly placed into 
classes according to the designated teaching method, with each class containing equal 
numbers of students of each quantitative ability level. For simplicity, it is assumed that any 
effects associated with the classes are negligible. 

This study has one experimental factor—teaching method—and one observational 
factor—quantitative ability. Equal numbers of students with excellent, good, and mod- 
erate quantitative ability are randomly selected and then within these categories, students 
are randomly assigned to a teaching method. Therefore, teaching ability is a blocking fac- 
tor here with replication within blocks. This experimental study is called a generalized 
randomized block design and is discussed further in Section 21.6. 

The response variable of interest is the amount of learning of college mathematics, 
as measured by a standard mathematics achievement test. The results of the study are 
summarized in Table 19.11 (the original data are not shown). The estimated treatment 
means are shown in Table 19.11a, and the analysis of variance table is presented in 
Table 19.11b. 

Figure 19.13 contains two plots of the estimated treatment means Y; j.. In Figure 19.13a, 
the two curves represent the different factor A levels, and in Figure 19.13b, the three curves 
represent the different factor B levels. The clear lack of parallelism of the curves suggests the 
presence of interaction effects between teaching method and student’s quantitative ability 
on amount of mathematics learning. A formal test for interactions confirms this. From 
Table 19.11b, we have F* = MSAB/MSE = 325.5/28 = 11.625. For a = .01 we require 
F(.99; 2,120) = 4.79. Since F* = 11.625 > 4.79, we conclude that interaction effects аге 
present. The P-value of this test is 0+. 


tek Sh gor on 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 859 


(a) Teaching Method Curves (b) Student Ability Curves 
A 
Excellent 
g g 
[72] [22] 
t t 
w 
E E 80 
$ © Moderate 
E 5 
& < 
à 0 
Excellent Good Moderate Abstract Standard 
Factor B (Student's Ability) Factor A (Teaching Method) 


Figure 19.13 suggests that the interactions are important: students with excellent quan- 
titative ability are but little affected by teaching method (perhaps doing slightly better with 
the abstract method); students with good or moderate abilities learn much better with the 
standard teaching method. Hence, we shall first investigate whether some simple transfor- 
mation can make the interactions unimportant. We do this in an approximate fashion by 
considering the logarithmic and square root transformations of the response. In neither case 
did the interactions become unimportant, so it appears that the interactions here may be 
nontransformable. 

We now wish to investigate the nature of the interaction effects in Figure 19.13. We shall 
do this by estimating separately for students with excellent, good, and moderate quantitative 
abilities how large is the difference in mean learning for the two teaching methods. Thus, 
we wish to estimate: 


Di = uu — Ш 
D; = шә — Hu 
D3 = ps — оз 


We shall employ the Bonferroni multiple comparison procedure with family confidence 
coefficient .95. (Since only three pairwise comparisons are of interest, the Bonferroni method 
yields more precise estimates here than the Tukey method.) 

For the data in Table 19.11a, the point estimates of the pairwise comparisons are: 


b, =92-90=2 
Ô, = 81 — 86 = —5 
Ёз = 73 — 82 = —9 


860 PartFive  Multi-Factor Studies 


We find the estimated variances of these estimates by (19.88b), for n = 21: 


Lec 2(28) 


58401} = eÂ} = 8i = 2.667 


so that: 
s{D,} = s{Do} = s(D4) = 1.633 


Finally, for family confidence coefficient 1 — о = .95 and g=3, we require B= 
t|1—.05/2(3); 120] = t(.99167; 120) = 2.428. Hence, the confidence limits are by (19.88) 
and (19.90): 


2 + 2.428(1.633) —5 + 2.428(1.633) —9 + 2.428(1.633) 
and the 95 percent confidence intervals for the family of comparisons are: 


—].96 < Ши = Ha < 5.96 
—8.96 < ui» — uz < — 1.04 
— 12.96 < шз — H23 < —5.04 


For this family of confidence intervals, the following conclusions may Бе drawn with 
family confidence coefficient of 95 percent: (1) For students with excellent quantitative 
ability, the mean learning scores with the two teaching methods do not differ. (2) For 
students with either good or moderate quantitative abilities, the mean learning score with 
the abstract teaching method is lower than that with the standard method. The superiority 
of the standard teaching method may be particularly strong for students with moderate 
quantitative ability. 


Example 2—Contrasts of Treatment Means 
In the mathematics learning example, a school administrator also wished to know whether 
the amount of learning gain with the standard teaching method over the abstract method is 
greater for students with moderate quantitative ability than for students with good quanti- 
tative ability. This question had been raised before the study began. We shall estimate the 
single contrast: * 


L = (423 — H13) — (422 — шо) 


by means of a one-sided lower confidence interval. For the results in Table 19.1 La, the point 
estimate of L is Ê = (82 — 73) — (86 — 81) = 4. The estimated variance by (19.93b) is: 


D. 28 Э, 2 2 2 
SL = 310 + (1) + (1) + (1)7] = 5.333 
so that the estimated standard deviation is s(£] — 2.309. For a 95 percent confidence 


coefficient, we require 1(.05;120) = —1.658. Hence, the lower confidence limit is 
4 — 1.658(2.309) and the desired confidence interval is: 


L> 17 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 861 


3 We conclude, therefore, with 95 percent confidence coefficient that the gain in learning 

| with the standard teaching method over the abstract method is greater for students with 
moderate quantitative ability than for students with good quantitative ability, the difference 
in the mean gain being at least .17 point. 


19.10 Pooling Sums of Squares in ‘Two-Factor Analysis 
of Variance 


The testing approach presented in this chapter assumes that ANOVA model (19.23) is the 
full model for all tests of factor effects, regardless of the conclusions reached in any of these 
tests. The rationale for this approach is that ANOVA model (19.23) is based on the identity 
(19.22) for the treatment means u;j. Once the analysis of residuals and other diagnostics 
demonstrate that this model is appropriate, it is used for all tests. 

Some statisticians take the view that ANOVA model (19.23) should be revised when the 
test for interaction effects leads to the conclusion that no interactions are present. With this 
approach, the full model considered in testing for factor A and factor B main effects when 
the test for interaction effects leads to the conclusion that no interactions are present is the 
revised model: 


Үд = p. + Oi + В; + Eijk Revised full model (19.96) 


As we just noted with the regression approach for the Castle Bakery example, the extra 
sums of squares for factor A and factor В main effects do not depend on the order of the 
extra sums of squares for factor effects when all treatment sample sizes are equal. Hence, 
the numerator sums of squares SSA and SSB of the test statistic F* are not affected by this 
revision in the full model when the treatment sample sizes are equal. The denominator sum 
of squares of the Е* test statistic is affected, however, leading to the following error sum of 
squares for the full model: 


SSE(F) = SSE + SSAB (19.97) 


Thus, the error sum of squares for the full model with this approach involves the pooling 
of the interaction and error sums of Squares. Likewise, the degrees of freedom are pooled; 
the degrees of freedom associated with SSE(F) are: 


df, = (a— 1)(b — 1) + (n — Dab = nab —a — b +1 


For the Castle Bakery example, the pooled error sum of squares for testing factor A and 
factor B main effects would be (Table 19,9): 


SSE(F) = 62 + 24 = 86 
and the pooled degrees of freedom would be: 
df, =6+2=8 


Hence, the error mean square for testing factor А or factor B main effects with the model 
revision approach here would be 86/8 — 10.75. 

This pooling procedure affects both the level of significance and the power of the tests for 
factor A and factor B main effects, in ways not yet fully understood. It has been suggested 


862 PartFive Multi-Factor Studies 


therefore by some statisticians that pooling should not be considered unless: (1) the degrees 
of freedom associated with MSE are small, perhaps 5 or less, and (2) the test Statistic 
MSAB/MSE falls substantially below the action limit of the decision rule, perhaps when 
MSAB/MSE < 2 for a = .05. Part (1) of this rule is designed to limit pooling to cages 
where the gains may be substantial, while part (2) is designed to give reasonable assurance 
that there are indeed no interactions. 


19.41 Planning of Sample Sizes for Two-l'actor Studies 


We introduced the power approach to sample size planning for single-factor studies in 
Section 16.10, and the estimation approach to sample size planning for single-factor studies 
was discussed in Section 17.8. We now consider these two approaches in the context of 
two-factor studies. 


Power Approach 


Power of F Test. Table B.11 can be used for determining the power of tests for multi- 
factor studies in the same fashion as for single-factor studies. The only differences arise 
in the definition of the noncentrality parameter and the degrees of freedom. For two-factor 
fixed effects ANOVA model (19.23) with equal treatment sample sizes, the noncentrality 
parameter ф and the degrees of freedom v, and v» for testing for interaction effects, factor A 
main effects. and factor B main effects are as follows: 


Test for interactions: 


Е I n >> ав), oe 1 n> 2 Хи — ш. — шу + "OL 
PES а= ра eV аА. — (19,98а) 


v = (a — D(b— 1) v = ab(n — 1) 


Test for A main effects: 


a jns - Lec (19.98b) 
с а o a 


v"u-a-—l v = ab(n — 1) А 


Test for B main effects: 


Ja 1 na» P; Е 1 [па (m; — шн.” (19.980 
с р oO b 


у= р | v = ab(u — 1) 


Use of Table B.12 for Two-factor Studies. When planning sample sizes for two-factor 
studies with the power approach, one is concerned typically with both the power of detecting 
factor A main effects and the power of detecting factor B main effects. One can first specify 
the minimum range of factor A level means for which it is important to detect factor A 


e 
x 


Chapter 19 Two-Factor Studies with Equal Sample Sizes 863 


main effects, and obtain the needed sample sizes from Table B.12, with r = a. The resulting 
sample size is bn, from which n can be obtained readily. The use of Table B.12 for this 
purpose is appropriate provided the resulting sample size is not small, specifically provided 
a(bn — 1) > 20. If this condition is not met, the ANOVA power tables in Table B.11 should 
be used. These tables, as noted earlier, require an iterative approach for determining needed 
sample sizes. = 

In the same way, the minimum range of factor В level means can then be specified for 
which it is important to detect factor B main effects, and the needed sample sizes found. 
If the sample sizes obtained from the factor A and factor B power specifications differ 
substantially, a judgment will need to be made as to the final sample sizes. 


mation Approach 


Example 


The estimation approach to planning sample sizes described in Section 17.8 for single-factor 
studies is readily adapted for use in two-factor studies. We specify the set of comparisons of 
interest and determine the expected widths of the confidence intervals for various advance 
planning values for the standard deviation, с. Through an iterative, trial-and-error process, 
we determine a sample size plan that represents an acceptable compromise between the cost 
of running the study and the precision obtained for comparisons of interest. We illustrate 
this procedure with a two-factor study example. 


In a two-factor study, factor A has a = 3 levels and factor B has b = 2 levels. No interaction 
effects are anticipated, and all pairwise comparisons of factor level means are to be made for 
each of the two factors. A family confidence coefficient of .90 is specified for the 3+ 1 = 4 
pairwise comparisons. Equal treatment sample sizes of n experimental units are to be used. 
The width of each confidence interval is to be +30. A reasonable planning value for the 
standard deviation of the error terms is с = 50. 

We know from (19.63) that the variance of a comparison of factor A level means, 
Ё = Y... — Yp.., is: 


ae о? 20? К 
o*tL) = 2 La = —— Factor A comparisons 


Similarly, the variance of the comparison of the two factor B level means, f= Y,.  E., 
is: 


оі} = — Factor B comparison 
an 
Since equal precision is specified for all pairwise comparisons and since a = 3 and b = 2, 
the variance for the factor А comparisons will be larger for any given treatment sample size 
n and hence will be the critical consideration. 

Suppose that we begin the iterative process with n = 30. We then find for the factor A 
comparisons that o?{Î} = 2(50)2/2(30) = 83.33 or o {Î} = 9.13. For nr = 6(30) = 
180, œ —.10, and g=4 comparisons, the Bonferroni multiple is В —1(.9875; 174) = 
2.26. Hence, the anticipated width of the confidence intervals is 2.26(9.13) =+20.6. This 


864 PartFive  Multi-Factor Studies 


anticipated width is somewhat tighter than the specified width +30, and 


a smaller treatment 
sample size should be tried in the next iteration. 


Finding the “Best” Treatment 


As we discussed earlier in Section 16.11 in the context of single-factor studies, there are 
occasions when the chief purpose of the study is to ascertain the treatment with the highest 
or lowest mean. This is also true for two-factor studies, where the objective is to identify 
the best of the r = ab factor level combinations. We illustrate the use of this approach with 
an example. 


Two-Factor Study Example. Suppose that in the Castle Bakery example, the chief ob- 
jective is to identify the combination of shelf height and shelf width that maximizes sales 
(in cases). There are 3 x 2 = 6 treatment combinations. We anticipate that o = 10. Further, 
we want to be able to detect an average difference of à = 8 cases between the highest ang 
second highest treatment means with probability | — о = .90 or greater. 

The entry in Table B.13 is A./n/o. For г = 6 and probability | — œ = .90, we find from 
Table B.13 that A ./n/o = 2.7100. Hence, since à = 8, we obtain: 


(8 


Mi 
= 2.710 
10 0 
Jn = 3.3875 ог n= 12 


Thus, when the average number of cases for the best shelf height and shelf width treatment 
mean exceeds that of the second best by at least 8 cases and o — 10, sample sizes of 
12 supermarkets for each shelf height and shelf width combination are needed to provide 
an assurance of at least .90 that the highest estimated mean Y;;. corresponds to the highest 
population mean. 


Problems > 


19.1. Refer to the SENIC data set in Appendix С.1. An analyst wishes to investigate the effects of 
medical school affiliation (factor A) and geographic region (factor B) on infection risk. Ail 
factor level combinations will be included in the study. 


a. How many treatments are being studied? 


b. What is the response variable here? A 


19.2. A student in a class discussion stated: “A treatment is a treatment, whether the study involves 
asingle factor or multiple factors. The number of factors has little effect on the interpretation 
of the results.” Discuss. 

19.3. Verify the interactions in Table 19.3b. 


*19.4. In a two-factor study. the treatment means иу; are as follows: 


Factor B 
Factor A B1 B; В; 
Ay 34 23 36 


А; 40 29 42 


19.5. 


19.6. 


*197. 


19.8. 


19.9. 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 865 


. Obtain the factor A level means. 
. Obtain the main effects of factor A. 


c. Does the fact that uj; — д = —11 while шз — p12 = 13 imply that factors А and B 
interact? Explain. 


c om 


d. Prepare à tréatment means plot and determine whether the two factors interact. What do 
you find? 


In a two-factor study, the treatment means 44; are as follows: 


Factor B 
Factor A Bi B; Вз B4 
Ay 250 265 268 269 
A2 288 273 270 269 


a. Obtain the factor B main effects. What do your results imply about factor B? 

b. Prepare a treatment means plot and determine whether the two factors interact. 
How can you tell that interactions are present? Are the interactions important or 
unimportant? 

c. Make a logarithmic transformation of the шу and plot the transformed values to ex- 
plore whether this transformation is helpful in reducing the interactions. What are your 
findings? 

Three sets of treatment means 4; for students’ grades in a course follow, where factor A 

is student's major (A,: computer science; Аз: mathematics) and factor В is student's class 

affiliation (В|: junior; Вә: senior; Вз: graduate). 


Set 1 Set 2 Set 3 


Bi B2 Вз В B2 Вз В! B2 Вз 
А: 80 80 80 A1 75 80 90 A1 75 80 85 
Аз 90 90 90 A2 80 86 97 A2 75 85 100 


Prepare a treatment means plot for each set of u;; to study interaction effects. Interpret each 
plot and state your findings. If interactions are present, describe their nature and indicate 
whether they are important or unimportant. 


Refer to Problem 19.4. Assume that o = 1.4 and n = 10. 

а. Obtain E{MSE) and E{MSA}. 

b. Is E(MSA) substantially larger than E(MSE]? What is the implication of this? 

Refer to Problem 19.5. Assume that о = 4 and n = 6. 

а. Obtain E{MSE} and E(MSAB). 

b. Is E(MSAB) substantially larger than E{MSE}? What is the implication of this? 

A psychologist stated: “I feel uncomfortable about deciding in a research study whether 
the interactions are important or unimportant. I would rather have the statistician make that 
decision" Comment. 


866 PartFive Multi-Factor Studies 


*19.10. Refer to Cash offers Problem 16.10. Six male and six female volunteers were used jn 
age group. The observations (in hundred dollars), classified by age (factor A) and 
owner (factor B), follow. 


x19.11. 


each 
gender of 


Factor B 
(gender of owner) 
Factor A j=1 j=2 
(age) Male Female 
i=1_ Young 21 21 
23 22 
23 25 
i=2 Middle 30 26 
29 29 
27 29 
i=3 Elderly 25 23 
22 19 
21 20 


. Obtain the fitted values for ANOVA model (19.23). 


b. Obtain the residuals. Do they sum to zero for each treatment? 


е. 


. Prepare aligned residual dot plots for the treatments. What departures from ANOVA model 


(19.23) can be studied from these plots? What are your findings? 


. Prepare a normal probability plot of the residuals. Also obtain the coefficient of corela- 


tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 

The observations for each treatment were obtained in the order shown. Prepare residual 
» sequence plots and interpret them. What are your findings? 


Refer to Cash offers Problems 16.10 and 19.10. Assume that ANOVA model (19.23) is 


applicable. 

a. Prepare an estimated treatment means plot. Does it appear that any factor effects are 
present? Explain. 

b. Set up the analysis of variance table. Does any one source account for most of the total 
variability in cash offers in the study? Explain. 

c. Test whether or not interaction effects are present; use œ = .05. State the alternatives, 
decision rule, and conclusion. What is the P-vaiue of the test? 

d. Test whether or not age and gender main effects are present. In each case, use о = .05 and 
state the alternatives, decision rule, and conclusion. What is the P-value of the test? Б it 
meaningful here to test for main factor effects? Explain. 

e. Obtain an upper bound on the family level of significance for the tests in parts (c) and (9); 
use the Kimball inequality (19.53). 

f. Do the results in parts (c) and (d) confirm your graphic analysis in part (a)? 


g. 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 867 


What are the relations between the sums of squares in the two-factor analysis of variance 
in part (b) and the sums of squares in the single-factor analysis of variance in Problem 
16.10d? Do the same relations hold for the degrees of freedom? 


19.12. Eye contact effect. In a study of the effect of applicant's eye contact (factor A) and personnel 
Officer’s gender (factor B) on the personnel officer's assessment of likely job success of 
applicant, 10 male and 10 female personnel officers were shown a front view photograph of 
an applicant's face and were asked to give the person in the photograph a success rating оп a 
scale of 0 (total failure) to 20 (outstanding success). Half of the officers in each gender group 
were chosen at random to receive a version of the photograph in which the applicant made 
eye contact with the camera lens. The other half received a version in which there was no eye 
contact. The success ratings follow. 


Factor B 

(gender of officer) 

Factor A j=l j22 

(eye contact) Male Female 
j—1 Present 11 15 
7 12 
10 16 
i=2 Absent 12 14 
16 17 
14 18 


Obtain the fitted values for ANOVA model (19.23). 


. Obtain the residuals. Do they sum to zero for each treatment? 


c. Prepare aligned residual dot plots for the treatments. What departures from ANOVA model 


(19.23) can be studied from these plots? What are your findings? 


. Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla- 


tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 


. The observations for each treatment were obtained in the order shown. Prepare residual 


sequence plots and interpret them. What are your findings? 


19.13. Referto Eye contact effect Problem 19.12. Assume that ANOVA model (19.23) is applicable. 


a. 


b. 


Prepare an estimated treatment means plot. Does it appear that any factor effects are 
present? Explain. 

Set up the analysis ОЁ variance table. Does any one source account for most of the total 
variability in the success ratings in the study? Explain. 


. Test whether or not interaction effects are present; use о = .01. State the alternatives, 


decision rule, and conclusion. What is the P-value of the test? 

Test whether or not eye contact and gender main effects are present. Ín each case, use 
a = .01 and state the alternatives, decision rule, and conclusion. What is the P-value of 
each test? Is it meaningful here to test for main factor effects? Explain. 


868 PartFive Multi-Factor Studies 


е. 


f. 


Obtain an upper bound on the family level of significance for the tests in parts (c) and 


usc the Kimball inequality (19.53). 9; 


Do the results in parts (c) and (d) confirm your graphic analysis in part (а)? 


*19.14. Hay fever relief. A research laboratory was devcloping а new compound for the relief of 
severe cases of hay fever. In an experiment with 36 volunteers, the amounts of the two active 
ingredients (factors A and B) in the compound were varied at three levels each. Randomization 
was used in assigning four volunteers to each of the nine treatments. The data on hours of 
relief follow. 


«19.15. 


19.16. 


Factor B (ingredient 2) 


Factor A j=1 j=2 j=3 
(ingredient 1) Low Medium High 
i=1 Low 2.4 4.6 4.8 
2.5 47 4.6 
i=2 Medium 5.8 8.9 9.1 
5.3 9.0 9.4 
i=3 High 6.1 9.9 13.5 
6.2 10.1 13.2 


Obtain the fitted values for ANOVA model (19.23). 

Obtain the residuals. 

Płot the residuals against the fitted values. What departures from ANOVA model (19.23) 
can be studied from this plot? What are your findings? 

Prepare a normal probability plot of the residuals. Also obtain the coefficient of correja- 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 


Refer to Hay fever relief Problem 19.14. Assume that ANOVA model (19.23) is applicable. 


a. 


f. 


Prepare an estimated treatment means plot. Does your graph suggest that any factor effects 
аге present? Explain. 


. Obtain the analysis of variance table. Does any one source account for most of the total 


variability in hours of relief in the study? Explain. " 


. Test whether or not the two factors interact: use о = .05. State the alternatives, decision 


rule, and conclusion. What is the P-value of the test? 


. Test whether or not main effects for the two ingredients are present. Use o = .05 in each 


case and state the alternatives. decision rule, and conclusion. What is the P-value of each 
test? 15 it meaningful here to test for main factor effects? Explain. 


. Obtain an upper bound on the family level of significance for the tests in parts (c) and (d); 


use the Kimball inequality (19.53). 


Do the resuits in parts (c) and (d) confirm your graphic analysis in part (a)? 


Disk drive service. The staff of a service center for electronic equipment includes three 
technicians who specialize in repairing three widely used makes of disk drives for desktop 
computers. It was desired to study the effects of technician (factor A) and make of disk drive 
(factor B) on the service time. The data that follow show the number of minutes required to 


"EPM 


з 


Chapter 19 Two-Factor Studies with Equal Sample Sizes 869 


complete the repair job in a study where each technician was randomly assigned to five jobs 


оп each make of disk drive. 
Factor B (make of drive) 
n Factor A j21 j=2 j=3 
M (technician) Make 1 Make 2 Make 3 
i=1 Technician 1 62 57 59 
48 45 53 
69 44 47 
= 2 Technician 2 51 61 55 
57 58 58 
39 51 49 
і= 3 Technician 3 59 58 47 
65 63 56 
70 60 50 


а. Obtain the fitted values for ANOVA model (19.23). 

b. Obtain the residuals. 

C. Plot the residuals against the fitted values. What departures from ANOVA model (19.23) 
can be studied from this plot? What are your findings? 

d. Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla- 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 

e. The observations for each treatment were obtained in the order shown. Prepare residual 
sequence plots and analyze them. What are your findings? 


19.17. Referto Disk drive service Problem 19.16. Assume that ANOVA model (19.23) is applicable. 


a. Prepare an estimated treatment means plot. Does your graph suggest that any factor effects 
are present? Explain. 

b. Obtain the analysis of variance table. Does any one source account for most of the total 
variability? Explain. 

c. Test whether or not the two factors interact; use œ = .01. State the alternatives, decision 
rule, and conclusion. What 1s the P-value of the test? 

d. Test whether or not main effects for technician and make of drive are present. Use o = .01 
in each case and state the alternatives, decision rule, and conclusion. What is the P-value 
of each test? Is it meaningful here to test for main factor effects? Explain. 

e. Obtain an upper bound on the family level of significance for the tests in parts (c) and (d); 
use the Kimball inequality (19.53). 

f. Do the results in parts (c) and (d) confirm your graphic analysis in part (2)? 

19.18. Kidney failure hospitalization. Kidney failure patients are commonly treated on dialysis 
machines that filter toxic substances from the blood. The appropriate “dose” for effective 
treatment depends, among other things, on duration of treatment and weight gain between 
treatments as a result of fluid buildup. To study the effects of these two factors on the number 
of days hospitalized (attributable to the disease) during a year, a random sample af'10 patients 
per group who had undergone treatment at a large dialysis facility was obtained. Treatment 


870 PartFive Multi-Factor Studies 


a 


19.19. 


x19.20. 


duration (factor A) was categorized into two groups: short duration (average dialysis time for 
the year under four hours) and long duration (average dialysis time for the year equal to or 
greater than four hours). Average weight gain between treatments (factor B) during the year 
was categorized into three groups: slight, moderate, and substantial. The data on number of 
days hospitalized follow. 


Factor B (weight gain) 


Factor A j^1 j=2 j=3 
(duration) Mild Moderate Substantial 
i=1 Short 0 2 2 4 15 16 
2 0 4 3 10 7 
0 8 15 20 25 27 
i=2 Long 0 2 5 1 10 15 
7 3 3 8 4 
4 3 1 9 7 1 


The transformed data Y' = log, (Y + 1) are to be used for the analysis. 


a. Obtain the fitted values and residuals for ANOVA model (19.23) for the transformed 
data. 


b. Prepare aligned residual dot plots for the treatments. What departures from ANOVA model 
(19.23) can be studied from these plots? What are your findings? 

c. Prepare a normal probability plot of the residuals. Also obtain the coefficient of conela- 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 


Refer to Kidney failure hospitalization Problem 19.18. Assume that ANOVA model (19.23) 
is appropriate for the transformed response variable. 


a. Prepare an estimated treatment means plot. Does your graph suggest that any factor effects 
are present? Explain. 

b. Obtain the analysis of variance table. Does any one source account for most of the total 
variability? Explain. 

c. Test whether or not the two factors interact; use œ = .05. State the alternatives, decision 
rule, and conciusion. What is the P-value of the test? 

d. Test whether or not main effects for duration and weight gain are present. Use œ = .05 in 
each case and state the alternatives, decision rule, and conclusion. What is the P-value of 
each test? Is it meaningful here to test for main factor effects? Explain. 

e. Obtain an upper bound on the family level of significance for the tests in parts (с) and (9); 
use the Kimball inequality (19.53). 


f. Do the results in parts (c) and (d) confirm your graphic analysis in part (a)? 


Programmer requirements. A computer software firm was encountering difficulties in fore- 
casting the programmer requirements for large-scale programming projects. As part of a study 
to remedy the difficulties, 24 programmers, classified into equal groups by type of exper 
ence (factor A) and amount of experience (factor B), were asked to predict the number of 
programmer-days required to complete a large project about to be initiated. After this project 


*19.21. 


19.22. 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 871 


was completed, the prediction errors (actual minus predicted programmer-days) were deter- 
mined. The data on prediction errors follow. 


Factor B (years of experience) 


Factor A j^1 j=2 j=3 

(type of experience) Under 5 5-under 10 10 or more 

i=1 Small 240 110 56 

systems only 206 118 60 

217 103 68 

225 95 58 

i=2 Small and 7i 47 37 

large systems 53 52 33 

68 31 40 

57 49 45 


. Obtain the fitted values for ANOVA model (19.23). 


b. Obtain the residuals. 


. Prepare aligned residual dot plots for the treatments. What departures from ANOVA model 


(19.23) can be studied from these plots? What are your findings? 


. Prepare а normal probability plot of the residuals. Also obtain the coefficient of correla- 


tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 


Refer to Programmer requirements Problem 19.20. Assume that ANOVA model (19.23) is 
applicable. 


a. 


f. 


Prepare an estimated treatment means plot. Does your graph suggest that any factor effects 
are present? Explain. 


. Obtain the analysis of variance table. Does any one source account for most of the total 


variability? Explain. 


. Test whether or not the two factors interact; use œ = .01. State the alternatives, decision 


rule, and conclusion. What is the P-value of the test? 


. Test whether or not main effects for type of experience and years of experience are present. 


Use a = .01 in each case and state the alternatives, decision rule, and conclusion. What is 
the P-value of each-test? Is it meaningful here to test for main factor effects? Explain. 


. Obtain an upper bound on the family level of significance for the tests in parts (c) and (d); 


use the Kimball inequality (19.53). 
Do the results in parts (c) and (d) confirm your graphic analysis in part (a)? 


How does the randomization of treatment assignments in a two-factor study differ when both 
factors are experimental factors and when only one factor is an experimental factor? 


19.23. Refer to Eye contact effect Problem 19.12. 


a. 


b. 


Explain how you would make the assignments of personnel officers to treatments in this 
two-factor study. Make all appropriate randomizations. 


Did you randomize the officers to the factor levels of each factor? 


*19.24. Refer to Hay fever relief Problem 19.14. 


a. 


b. 


Explain how you would make the assignments of volunteers to treatments in this study. 
Make all appropriate randomizations. 


Did you randomize the volunteers to the factor levels of each factor? 


872 PartFive Multi-Factor Studies 


19.25. 


19.26. 


* 19.27. 


19.28. 


19.29. 


x19.30. 


Refer to Disk drive service Problem 19.16. 


a. lsany randomization of treatment assignments called for in this study? ls any randomizag, 
utilized? Explain. n 


b. Would you consider this study to be experimental in nature? Discuss. 


Why is it suggested in the flowchart in Figure 19.11 that a test for interactions Should be 
conducted before tests for main factor effects? Explain. 

A two-factor study was conducted with a = 5, b = 5, and п = 4. No interactions between 
factors A and B were noted, and the analyst now wishes to estimate all pairwise comparisons 
among the factor A level means and all pairwise comparisons among the factor B level means. 
The family confidence coefficient for the joint set of interval estimates is to be 90 percent 


a. Is it more efficient to use the Bonferroni procedure for the entire family or to use the 
Tukey procedure for each family of factor level mean comparisons and then to join thetwo 
families by means of the Bonferroni procedure? 

b. Would your answer differ if each factor had three levels, everything else remaining the 
same? 


A two-factor study was conducted with а = 6, b = 6, and п = 10. No interactions between 

factors A and B were found, and it is now desired to estimate five contrasts of factor A level 

means and four contrasts of factor B level means. The family confidence coefficient for the 
joint set of estimates is to be 95 percent. Which of the three procedures at the bottom of 
page 852 and the top of page 853 will be most efficient here? 

Refer to the Castle Bakery example at the top of page 855, where two pairwise comparison 

estimates were made by means of the Tukey procedure. Why would it not be appropriate to 

use the Bonferroni procedure here? Discuss. 

Refer to Cash offers Problems 19.10 and 19.11. 

a. Estimate ру with a 95 percent confidence interval. Interpret your interval estimate. 

b. Prepare a bar graph of the estimated factor В level means. What does this plot suggest 
about the equality of the factor B level means? 

c. Estimate D == н. — р.з by means of a 95 percent confidence interval. 15 your confidence 
interval consistent with the test result in Problem 19.114? Is your confidence interval 
consistent with your finding in part (b)? Explain. 

d. Prepare a bar graph of the estimated factor A level means. What does this plot suggest 
about the factor A main effects? 

e. Obtain all pairwise comparisons among the factor A level means; use the Tukey procedure 
with a 90 percent family confidence coefficient. Present your findings graphically and 
summarize your results, Are your conclusions consistent with those in part (d)? 

f. 15 the Tukey procedure used in part (e) the most efficient one that could be used here? 
Explain. 


g. Estimate the contrast: 


p HR ES os 
2 
with a 95 percent confidence interval. Interpret your interval estimate. 
h. Suppose that in the population of female owners, 30 percent are young, 60 percent arè 
middle-aged, and 10 percent are elderly. Obtain a 95 percent confidence interval for the 
mean cash offer in the population of female owners. 


19.31. 


*19.32. 


19.33. 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 873 


Refer to Eye contact effect Problems 19.12 and 19.13. 


g- 


. Estimate иә with a 99 percent confidence interval. Interpret your interval estimate. 
. Estimate иң. with a 99 percent confidence interval. Interpret your interval estimate. 
. Prepare a bar graph of the estimated factor B level means. What does this plot suggest 


about Ше factor B main effects? 


. Obtain confidence intervals for и. and р.ә, each with a 99 percent confidence coefficient. 


Interpret your interval estimates. What is the family confidence coefficient for the set of 
two estimates? 


. Prepare a bar graph of the estimated factor А level means. What does this plot suggest 


about the factor А main effects? 


. Obtain confidence intervals for D, = u2. — и. and D; = иэ — и.ү; use the Bonferroni 


procedure and a 95 percent family confidence coefficient. Summarize your findings. Are 
your findings consistent with those in parts (c) and (e)? 

Is the Bonferroni procedure used in part (f) the most efficient one that could be used here? 
Explain. 


Refer to Hay fever relief Problems 19.14 and 19.15. 


. Estimate uz with a 95 percent confidence interval. Interpret your interval estimate. 
. Estimate D = ji — ши with a 95 percent confidence interval. Interpret your interval 


estimate. 
. The analyst decided to study the nature of the interacting factor effects by means of the 
following contrasts: 
Ly = 8508 Га = l2- L, 
Іа = PATEE р Ls = Їз — Lı 
Ly = ETHS — py Le = Із — 15 


Obtain confidence intervals for these contrasts; use the Scheffé multiple comparison pro- 
cedure with a 90 percent family confidence coefficient. Interpret your findings. 


. Theanalyst also wished to identify the treatment(s) yielding the longest mean relief. Using 


the Tukey testing procedure with family significancelevelo — .10, identify the treatment(s) 
providing the longest mean relief. 


. To examine whether a transformation of the data would make the interactions unimportant, 


plot separately the transformed estimated treatment means for the reciprocal and square 
root transformations. Would either of these transformations have made the interaction 
effects unimportant? Explain. 


Refer to Disk drive service Problems 19.16 and 19.17. 


a. 


b. 


C. 


Estimate шуу with а 99 percent confidence interval. Interpret your interval estimate. 


Estimate D = u2 — Hz with a 99 percent confidence interval. Interpret your interval 
estimate. 

The nature of the interaction effects is to be studied by making, for each technician, all 
three pairwise comparisons among the disk drive makes in order to identify, if possible, 
the make of disk drive for which the technician's mean service time is lowest. The family 
confidence coefficient for each set of three pairwise comparisons is to be 9¥ percent. Use 
the Bonferroni procedure to make all required pairwise comparisons. Summarize your 
findings. 


874 PartFive Multi-Factor Studies 


d. 


. How much time could be saved per week, on the average, if technici 


The service center currently services 30 disk drives of each of the three makes 
with each technician servicing 10 machines of each make. Estimate the ex 
amount of service time required per week to service the 90 disk drives; 
confidence interval. 


Per Week, 
pected tota] 
use a 99 percent 


2 1 i an I services only 
make 2, technician 2 services only make I, and technician 3 services only make 39 Use a 


99 percent confidence interval. 

To examine whether a transformation of the data would make the interactions unimportant, 
plot separately the transformed estimated treatment means for the reciprocal and logarith 

mic transformations. Would either of these transformations have made the interaction 
effects unimportant? Explain. 


19.34. Refer to Kidney failure hospitalization Problems 19.18 and 19.19. Continue to work With 
the transformed observations Y’ = log,,(Y + D. 


а. 


b. 


Estimate рээ with a 95 percent confidence interval. Interpret your interval estimate. 


Estimate D = рэз — gui with a 95 percent confidence interval. Interpret your interval 
estimate. 


. Prepare separate bar graphs of the estimated factor A and factor B level means. What do 


these plots suggest about the factor main effects? 


. The researcher wishes to study the main effects of each of the two factors by making all 


pairwise comparisons of factor level means with a 90 percent family confidence coefficient 
for the entire set of comparisons. Which multiple comparison procedure is most efficient 
here? 


. Using the Bonferroni procedure, make all pairwise comparisons called for in part (d). State 


your findings and prepare a graphic summary. Are your findings consistent with those in 
part (c)? 

It is known from past experience that 30 percent of patients have mild weight gains, 
40 percent have moderate weight gains, and 30 percent have severe weight gains, and that 
these proportions are the same for the two duration groups. Estimate the mean number 
of days hospitalized (in transformed units) in the entire population with a 95 percent 
confidence interval. Convert your confidence limits to the original units. Does it appear 
that the mean number of days is less than 7? 


*19.35. Refer to Programmer requirements Problems 19.20 and 19.21. 


a. 


b. 


Estimate 42; with a 99 percent confidence interval. Interpret your interval estimate. 


Estimate D = рэ — рз with a 99 percent confidence interval. Interpret your iaterval 
estimate. 


. The nature of the interaction effects is to be studied by comparing the effect of type of 


experience for each years-of-experience group. Specifically, the following comparisons 
are to be estimated: 


D, = ши = un Ly 


і 
S 
| 
F 


Dr = ui — un Lə = Dı — D; 


Рз = щз ~ un Li = Də ~ Рз 


The family confidence coefficient is to be 95 percent. Which multiple comparison procedure 
is most efficient here? 


. Use the most efficient procedure to estimate the comparisons specified in part (с). State 


your findings. 


2229€ oa 


19.36. 


19.37. 


x 19.38. 


19.39. 


x 19.40. 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 875 


e. Use the Tukey testing procedure with family significance level a = .05 to identify the type 
of experience-years of experience group(s) with the smallest mean prediction errors. 

f. For each group identified in part (e), obtain a confidence interval for the mean prediction 
error. Use the Bonferroni procedure with a 95 percent family confidence coefficient. Does 
any group have a mean prediction error that could be zero? Explain. 

g- To examine whether atransformation of the data would make the interactions unimportant, 
plot separately the transformed estimated treatment means for the reciprocal and logarith- 
mic transformations. Would either of these transformations have made the interaction 
effects unimportant? Explain. 


Refer to Brand preference Problem 6.5. Suppose the market researcher first wished to employ 

analysis of variance model (19.23) to determine whether or not moisture content (factor A) 

and sweetness (factor B) affect the degree of brand liking. 

a. State the analysis of variance model for this case. 

b. Obtain the analysis of variance table. 

c. Test whether or not the two factors interact; use œ = .01. State the alternatives, decision 
rule, and conclusion. 


d. Study possible curvilinearity of the moisture content effect by estimating the following 
contrast: 


L = (ua. — из.) — (ио. — Ил.) 


Use a 95 percent confidence interval. What do you conclude? 


e. Test whether or not sweetness affects brand liking; use о = .01. State the alternatives, 
decision rule, and conclusion. 


A market research manager is planning to study the effects of duration of advertising (factor A) 
and price level (factor B) on sales. Each factor has three levels. No important interactions are 
expected, and the primary analysis is to consist of pairwise comparisons of factor level means 
for each factor. Equal sample sizes are to be used for each treatment. The precision of each 
comparison is to be +3 thousand dollars. The family confidence coefficient for the joint 
set of comparisons is to be 90 percent, the Tukey procedure is to be used in making the 
comparisons for each factor, and the Bonferroni procedure is then to be used to join the two 
sets of comparisons. Assume that с = 7 thousand dollars is a reasonable planning value for 
the error standard deviation. What sample sizes do you recommend? 

Refer to Cash offers Problem 19.10. Suppose that the sample sizes have not yet been deter- 
mined but it has been decided to use the same number of “owners” in each age-gender group. 
What are the required sample sizes if: (1) differences in the age factor level means are to be 
detected with probability .90 or more when the range of the factor level means is 3 (hundred 
dollars), and (2) the о risk is to be controlled at .05? Assume that a reasonable planning value 
for the error standard deviation is с = 1.5 (hundred dollars). 

Refer to Eye contact effect Problem 19.12. Suppose that the sample sizes have not yet been 
determined but it has been decided to use equal sample sizes for each treatment. Primary 
interest is in the two comparisons L; = иң. — ио. and Lz = ш. — 4.5. What are the required 
sample sizes if each of these comparisons is to be estimated with precision not to exceed +1.2 
with a 95 percent family confidence coefficient, using the most efficient multiple compari- 
son procedure? Assume that a reasonable planning value for the error standard deviation 1s 
о = 2.4. *' 
Refer to Hay fever relief Problem 19.14. Suppose that the sample sizes have not yet been 
determined but it has been decided to use equal sample sizes for each treatment. The chief 


876 Part Five Multi-Factor Studies 


Exercises 


19.41. 


*19.42. 


objective is to identify the dosage combination that yields the longest mean relief. Th 
ability should be at least .99 that the correct dosage combination is identified when the me. 
relief duration for the second best combination differs by .5 hour or more. What are de 
quired sample sizes? Assume that a reasonable planning value for the error standard deviation 
iso = .29 hour. 


€ prob. ' 


Refer to Kidney failure hospitalization Problem 19.18. Suppose that the sample sizes havı 

: А : dues р е 
not yet been determined but it has been decided to use equal sample sizes for each treatment. 
The chief objective is to estimate the pairwise comparisons: 


Ly = Hi. = na Ly = py — из 


15 = ил иә L4 = fo — из 


What are the required sample sizes if the precision of each of the estimates should not exceed 
+.20 (in transformed units), using the Bonferroni procedure with a family confidence coeff. 
cient of 90 percent for the joint set of comparisons? А reasonable planning value for the error 
standard deviation is o. = .32 (in transformed units). 

Refer to Programmer requirements Problem 19.20. Suppose that the sample sizes have not 
yet been determined but it has been decided to use equal sample sizes for each treatment, 
Primary interest is in identifying the type of experience-years of experience combination 
for which the mean prediction error is smallest. The probability should be at least .95 that 
the correct combination is identified when the mean prediction error for the second best 
combination differs by 8.0 programmer-days or more. Assume that a reasonable planning 
value for the error standard deviation is о = 9.1 days. What are the required sample sizes? 


. Derive (19.7a) from (19.7). 
. Prove the result in (19.9b). 
. (Calculus needed.) State the likelihood function for ANOVA model (19.15) when a — 2, 


b = 2, and n = 2. Find the maximum likelihood estimators. 


. (Calculus needed.) Derive (19.29). 

. Derive (19.39) from (19.38). 

. Show that the point estimator (19.67) is unbiased. Find the variance of this estimator. 

. Find the variance of the estimator (19.932). à 

. Consider a two-factor study with @ = 2 and b = 2. Show that the interactions (of)i» and 


(o. B)», are equal. 


Projects 


19.51. 


Refer to the SENIC data set in Appendix C.I. The following hospitals are to be considered 
in a study of the effects of region (factor A: variable 9) and average age of patients (factor B: 
variable 3) on the mean length of hospital stay of patients (variable 2): 


144 46 48 51 53 57 58 60 63 66 74 
76 79 80 83 84 88 94 101 103 111 


For purposes of this ANOVA study, average age is to be classified into two categories: less 
than or equal to 53.9 years, 54.0 years or more. 
a. Assemble the required data and obtain the fitted values for ANOVA model (19.23). 


b. Obtain the residuals. 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 877 


c. Plot the residuals against the fitted values. What departures from ANOVA model (19.23) 
can be studied from this plot? What are your findings? 

d. Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla- 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 


52. Refer to the SENIC data set in Appendix C.1 and Project 19.51. Assume that ANOVA model 


53. 


54. 


(19.23) is applicable. 

a. Prepare àn estimated treatment means plot. Does it appear that any factor effects are 
present? Explain. 

b. Obtain the analysis of variance table. Does any one source account for most of the total 
variability in the study? Explain. 

c. Test whether or not interaction effects are present; use œ = .05. State the alternatives, 
decision rule, and conclusion. What is the P-value of the test? 

d. Test whether or not region and age main effects are present. In each case, use о = .05 and 
state the alternatives, decision rule, and conclusion. What is the P-value of each test? Is it 
meaningful here to test for main factor effects? Explain. 

e. Obtain an upper bound on the family level of significance for the tests in parts (c) and (d); 
use the Kimball inequality (19.53). 

f. Do the results in parts (c) and (d) confirm your graphic analysis in part (a)? 

Refer to the CDI data set in Appendix C.2. The following metropolitan areas are to be 

considered in a study of the effects of region (factor A: variable 17) and percent below poverty 

level (factor B: variable 13) on the crime rate (variable 10 + variable 5): 


1-5 7 10-17 1929 32-34 3642 44 46 49 
51-52 54 57 75 84 87 94 136 151 
164 178 182 202 218 410 421 434 


For purposes of this ANOVA study, percent of population below poverty levelis to be classified 

into two categories: less than 8 percent, 8 percent or more. 

a. Assemble the required data and obtain the fitted values for ANOVA model (19.23). 

b. Obtain the residuals. 

c. Prepare aligned residual dot plots for the treatments. What departures from ANOVA model 
(19.23) can be studied from these plots? What are your findings? 

d. Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla- 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 

Refer to the CDI data set in Appendix C.2 and Project 19.53. Assume that ANOVA model 

(19.23) is applicable. 


a. Prepare an estimated treatment means plot. Does it appear that any factor effects are 
present? Explain. 

b. Set up the analysis of variance table. Does any one source account for most of the total 
variability in the study? Explain. 

c. Test whether or not interaction effects are present; use œ = .01. State the alternatives, 
decision rule, and conclusion. What is the P-value of the test? 

d. Test whether or not region and percent of population below poverty level maip effects are 
present. In each case, use œ = .01 and state the alternatives, decision rule, and conclusion. 
What is the P-value of each test? Is it meaningful here to test for main factor effects? 
Explain. 


Te coi tm emma 


878 Part Нуе Multi-Factor Studies 


19.55. 


19.56. 


19.57. 


19.58. 


е. 


Е 


Obtain an upper bound on ће family level of significance for the tests in р 


arts (c) а т 
use the Kimball inequality (19.53). nd (dy. 


Do the results in parts (c) and (d) confirm your graphic analysis in part (a)? 


$ 


Refer to the Market share data set in Appendix С.З. A balanced ANOVA study of the effects? 
of discount price (factor A: variable 5) and package promotion (factor B: variable 6) on the 
average monthly market share (variable 2) is to be conducted. Order the observations in the 
four factor-level combination cells from smallest to largest observation number and retain the 
first 7 observations in each cell for a total of 28 observations. (This process omits cases with 
identification numbers (variable 1) equal to 24, 25, 27, 28, 30, 33, 34, and 36.) 


a. 
b. 


C. 


Assemble the required data and obtain the fitted values for ANOVA model (19.23), 
Obtain the residuals. 


Plot the residuals against the fitted values. What departures from ANOVA model (1923) 
can be studied from this plot? What are your findings? 

Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla. 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 


Refer to the Market share data set in Appendix C.3 and Project 19.55. Assume that ANOVA 
model (19.23) is applicable. 


a. 


f. 


Prepare an estimated treatment means plot. Does it appear that any factor effects are 
present? Explain. 

Obtain the analysis of variance table. Does any one source account for most of the total 
variability in the study? Explain. 

Test whether or not interaction effects are present; use о = .05. State the altematives, 
decision rule, and conclusion. What is the P-value of the test? 

Test whether or not discount price and package promotion main effects are present. In each 
case, use @ = .05 and state the alternatives, decision rule, and conclusion. What is the 
P-value of each test? Is it meaningful here to test for main factor effects? Explain. 
Obtain an upper bound on the family level of significance for the tests in parts (c) and (d); 
use the Kimball inequality (19.53). 

Do the results in parts (c) and (d) confirm your graphic analysis in part (a)? 


Refer to the SENIC data set in Appendix С.І and Projects 19.51 and 19.52. 


a. 


Prepare a bar graph of the estimated factor level means Y,... What does this plot suggest 
regarding the region main effects? 

Analyze the effects of region on mean length of hospital stay by making all pairwise 
comparisons between regions: use the Tukey procedure and a 90 percent family confidence 
coefficient. State your findings and present a graphic summary. Are your findings consistent 
with those in part (a)? 


Refer to the CDI data set in Appendix C.2 and Projects 19.53 and 19.54. 


a. 


Prepare a bar graph of the estimated factor level means pa What does this plot suggest 
regarding the region main effects? 

Analyze the effects of region on crime rate by making all pairwise comparisons between 
regions; use the Tukey procedure and a 95 percent family confidence coefficient. State 
your findings and present a graphic summary. Are your findings consistent with those m 
part (а)? 


x 


19.60. 


Chapter 19  Two-Factor Studies with Equal Sample Sizes 879 


ase 19.59. 


Refer to the Real estate sales data set in Appendix C.7. Carry out a balanced two-way analysis 
of variance of this data set where the response of interest is sales price (variable 2) and the 
two crossed factors are quality (variable 10) and style (variable 11). Style is recoded as either 
1 or not 1. Order the observations in the six factor-level-combination cells from smallest 
to largestzobservation number and retain the first 25 observations in each cell for a total 
of 150 observations. The analysis should consider transformations of the response variable. 
Document tbe steps taken in your analysis and justify your conclusions. 

Refer to the Ischemic heart disease data set in Appendix C.9. Carry out a balanced two-way 
analysis of variance of this data set where the response of interest is total cost (variable 2) and 
the two crossed factors are number of interventions (variable 5) and number of comorbidities 
(variable 9). Recode the number of interventions into six categories: 0, 1, 2, 3—4, 5—7, and 
greater than or equal to 8. Recode the number of comorbidities into two categories: 0-1, and 
greater than or equal to 2. Order the observations in the twelve factor-level-combination cells 
from smallest to largest observation number and retain the first 43 observations in each cell 
for a total of 516 observations. The analysis should consider transformations of the response 
variable. Document the steps taken in your analysis and justify your conclusions. 


Chapter т | | 


]wo-Factor Studies—One 
Case per Treatment 


In many studies, constraints on cost, time, and materials severely limit the number of 
observations that can be obtained. For example, a process engineer in a manufacturing 
company may have only a limited time to experiment with the production line. If the line 
is available for one day and only eight batches of product can be produced in a day, the 
experiment may have to be limited to eight observations. If the study involves one factor at 
four levels and a second factor at two levels so that there are eight factor level combinations, 
only one replication of the experiment is then possible for each treatment. 

Another reason why some studies contain only one case per treatment is that the response 
of interest is a single aggregate measure of performance. For example, in a marketing 
research study of alternative package designs, evaluation of each alternative may require a 
separate market test. The response of interest is the observed market share, and this results 
in a single response for each treatment combination. 

'A modification of the ANOVA model is required for the analysis of two-factor studies 
containing only one replication per treatment because no degrees of freedom are available 
for estimation ofthe experimental error with the standard two-factor ANOVA model (19.23). 
In this chapter, we describe a modification of the ANOVA model that permits the two-factor 
analysis of variance to be conducted with only one case per treatment. 'This modification 
requires the assumption that the two factors do not interact. We then discuss inference 
procedures with this additive model. We conclude the chapter by considering a test for 
examining the reasonableness of the assumption of additivity of the two factors—1the Tukey 
test. This test is important not only when there is just a single case for each treatment in à 
two-factor study, but it is also useful for a variety of experimental designs to be discussed 
in later chapters. 


20.1 No-Interaction Model 


When there is only one case for each treatment, we no longer can work with two-factor 
ANOVA model (19.23) because no estimate of the error variance o? will be available. 
Recall from (19.37c) that SSE is a sum of squares made up of components measuring the 
variability within each treatment, 5 7, (У; — Y. )*. With only one case per treatment, there 
is no variability within a treatment, and SSE will then always be zero. 


Chapter 20  Two-Factor Studies—One Case per Treatment 881 


E A way out of this difficulty is to change the model. Formula (19.42d) indicates that if 
t the two factors do not interact so that (af);; = 0, the interaction mean square MSAB has 
| expectation o?. Thus, if it is possible to assume that the two factors do not interact, we 
S may use MSAB as the estimator of the error variance o? and proceed with the analysis of 
x factor effects as usual. If it is unreasonable to assume that the two factors do not interact, 
transformations may be tried to remove the interaction effects. We shall say more about this 
in the next section. 


4 


The two-factor ANOVA model with fixed factor levels in (19.23), when all interactions are 
zero so that (wB);; = 0, becomes for n = 1, the case considered here: 


Ү = u. +a; + Bj + Ej (20.1) 


Note that the third subscript has been dropped from ће Y and = terms because there is now 
only one case per treatinent. 


Analysis of Variance 
The factor effects sums of squares SSA and SSB are calculated as before from (19.392) and 
(19.39b), respectively, with п = 1. The interaction sum of squares in (19.39c) with n = 1 
now is expressed as follows: 


SSAB — 2.5.08 -Y.—Y;«Yy n=1 (20.2) 
i j 


Note that SSAB in (20.2) is identical to SSAB in (19.39c) with n = 1; the third subscript has 
been dropped because there is only one case per treatment, and the mean Y;;. is replaced by 
the observation Y;; for the same reason. The number of degrees of freedom associated with 
SSAB in (20.2) 1s the same as before, namely, (a — 1)(b — 1). The analysis of variance table 
for the case n — 1 for no-interaction model (20.1) is shown in Table 20.1. 


Inference Procedures 


No new problems arise in the tests for factor A and factor B main effects, nor in estimating 
these effects. Since the expected value of MSAB is о? for no-interaction model (20.1), as 


TABLE 20.1 ANOVA Table for No-Interaction Two-Factor Model (20.1) with Fixed Factor Levels, n = 1. 


55 ы df MS E{MS} 

SSA= БУ Ke? а—1 МЅА= = о? +b и 

SSB = aXX, К? Eni MSB о?+а Lech 
SSAB 0 


SSAB. —3$ YY. {+ Y. (a—1)(b—1) MSAB = (8 1Xb— 1) e 


$5TO Y: Y (vij — Ү2? àb—t 


882 PartFive Multi-Factor Studies 


Example 


TABLE 20.2 
Two-Factor 
Study with 

n = 1— 
Insurance 
Premium 
Example. 


shown in the last column of Table 20.1, the F* test statistics for testing factor A and factor В 
main effects will now utilize MSAB in the denominator, instead of MSE as before: 


Factor A main effects: F* = as 
MSAB (20.3a) 
Factor B main effects: Е* = AOE. (20 
MSAB 3b) 


Similarly, for estimating comparisons of factor A and factor В level means, we simply 
replace MSE in all of the earlier results with MSAB as the estimator of the error variance 
o? and modify the degrees of freedom accordingly. 

A special problem exists in estimating treatment means. We shall explain how to handle 
this problem after presenting an example. 


An analyst in an insurance commissioner's office studied the premiums for automobile 
insurance charged by an insurance company in six cities. The six cities were selected to 
represent different regions of the state and different sizes of cities. Table 20.2a shows the 
amounts of three-month premiums charged by the automobile insurance firm for a specific 
type and amount of coverage in a given risk category for each of the six cities, classified by 
size of city (factor A) and geographic region (factor B). Note there is only one observation 
per cell, namely, the amount of the premium charged in the city for each factor level 
combination. The analyst wished to evaluate the effects of city size and geographic region 
on the amount of the premium. 

Figure 20.1 contains a plot of the observations Y;;. Note since n = 1 here that the 
observations Y;; constitute estimates of the treatment means џ;;. It appears from Figure 20.1 
that there could be a slight interaction between region and size of city in their effects on the 


(a) Premiums for Automobile Insurance Policy (in dollars) 


Region (factor B) 


Size of City East West 

(factor A) (/ = 1) (j = 2) Ауегаде 

Small (; = 1) 140 100 120 

Medium (7 = 2) 210 180 195 " 

Large (; — 3) 220 200 210 

Average 190 160 175 

(b) ANOVA Table 

Source of | 

Variatlon AX) df MS 

Size of city (A) 9,300 2 4,650 

Region (B) 1,350 1 1,350 

Error 100 2 50 
' Total 10,750 5 


Chapter 20  Two-Factor Studies—One Case per Treatment 883 


T. ы 
200 Large city 


180 Medium city 


Premium 


East West 
Region 


premium. However, since there is only one observation per treatment, the moderate lack of 
parallelism in the response lines could simply be the result of random effects within each 
treatment cell. The analyst conducted the Tukey test for interactions (to be discussed in 
Section 20.2), which indicated that no interaction effects are present. Hence, the analyst 
adopted the no-interaction model (20.1). 
The analyst obtained the required sums of squares as follows, using (19.372) and (19.39) 
for n — 1: 
SSA = 2[(120 — 175)? + (195 — 175)? + (210 — 175)?] = 9,300 
SSB = 3[(190 — 175)? + (160 — 175)?] = 1,350 
SSAB = (140 — 120 — 190 + 175)? + - - - + (200 — 210 — 160 + 175)? = 100 
SSTO = (140 — 175)* + - - - + (200 — 175)? = 10,750 
The ANOVA table is given in Table 20.2b. For the test of city size (factor А) effects, the 
alternative conclusions are: 
Ho: а = 05 = 0з = 0 
Ha: not all о; equal zero 
The F* test statistic is given by (20.3a): 


MSA 4,650 
ce = 93 


~ MSAB 50 
and the decision rule for о = .05 is [remember that the denominator of F* here involves 
(a — 1)(b — 1) degrees of freedom]: 

If F* < F[1—0;a—1,(a — D(b — D] = Е(.95; 2, 2) = 19.0, conclude Ho 
If F* > F[1—0;a—1,(a — D(b — 1)] = Ё(.95; 2, 2) = 19.0, conclude Н, 


Since F* = 93 > 19.0, we conclude H,, that city size effects are present. The P-value of 
the test is .011. LN 


884 PartFive  Multi-Factor Studies 


The test for geographic region (factor B) effects proceeds similarly, the alternative con. 
clusions being: 


Hy: В. = В = 0 


На: not all f; equal zero 
For œ = .05 the decision rule is: 


If F* < F(.95: 1, 2) = 18.5, conclude Hy 
If F* > F(.95; 1.2) = 18.5, conclude H, 


Test statistic (20.3b) here is: 


| MSB 1,350 
F*———-————2] 
MSAB 50 
Since F* — 27 » 18.5, we conclude H,, that geographic region effects are present. The 
P-value of this test is .035. 

The analysis of the magnitudes of the geographic region and city size main effects 
involves no new problems. The analyst employed three pairwise comparisons of the factor 
level means уу. for city size effects and a pairwise comparison of the geographic region 
factor level means w.;. The methods described in Section 19.8 are entirely applicable here; 
the error variance o? is now estimated by MSAB, and the degrees of freedom associated 
with the estimate of the error variance now are (a — 1)(b — 1). Since no new issues are 
involved in the analysis, we do not present further details. 


Estimation of Treatment Mean 


Example 


Occasionally when no-interaction model (20.1) is employed with n = I, there is interest 
in estimating a treatment mean р; ;. We could estimate treatment mean ру; in the usual 
fashion with the sample mean Y; j., here simply the single observation Y;;. However, we can 
obtain an improved estimate by making use of the model assumption of no interactions. We 
know from (19.72) that when the factor effects are additive, the treatment mean 44, can be 
expressed as follows: 


hij = ш. + h-j — H- (20.4) 
Hence, we can estimate и; for additive model (20.1) by substituting the estimated уай 
fj. = Yi., Ё. = Y, and й.. = Y. into (20.4): 

hu; = Y Y; - X. (205) 
The estimator of the treatment mean qj; in (20.5) is an improved estimator because it has 


minimum variance in the class of unbiased linear estimators according to an extension of 
the Gauss-Markov theorem (1.11). 


For the insurance premium example in Table 20.2a, we shall use (20.5) to obtain improved 
estimates of the treatment means иу. We obtain, for instance: 


йл = 120 + 190 — 175 = 135 
йэ = 120 + 160 — 175 = 105 


Chapter 20  Two-Factor Studies—One Case per Treatment 885 


The other treatment mean estimates are: 
fia = 210. до = 180 фз = 225 Ёз = 195 


Note that these improved estimates differ only slightly from the simpler estimates Y;; in 
Table 20.2a. 


Precision of Estimated Treatment Mean. To set up a confidence interval for a treatment 
mean Hj, we require the estimated variance of ĝ;; in (20.5). One simple method of estimat- 
ing this variance is by means of the regression model equivalent to ANOVA model (20.1). 
For the insurance premium example, this equivalent regression model is: 


Y;; = H.. +0 Хуу + 02 Xij2 + B1Xijs + &ij 


where: 
1 ifsmallcity 
X,—4-—1 iflargecity 
0 if medium city 
1 ifmediumcity 
Хә = 4 —1 iflargecity 
О if small city 
xcu 1 if region East 
37 | -1 if region West 


Note that the fitted value for observation Y;; will be: 
fi =Y. +â; +Ê; 
which is identical to у in (20.5): 
fusYG qq Еу as ee ЕЕ 


Hence, the estimated variance of Y ij is also the estimated variance of ĝ;j. The estimated 
variance s?(Y;;) is furnished by most computer regression packages or can be calculated 
by means of (6.58). 


Comments - 


1. The analysis of two-factor studies with n = 1 just outlined depends on the assumption that the 
two factors do not interact. If this analysis is used when in-fact interactions are present, the result is 
that the actual level of significance for testing factor A and factor B main effects is below the specified 
level and the actual power of the tests is lower than the expected power. Correspondingly, confidence 
intervals for contrasts of factor level means will tend to be too wide. This means that when interactions 
are present, the analysis is more likely to fail to disclose real effects than anticipated. However, when 
the analysis based on the no-interaction model does indicate the presence of factor A or factor B main 
effects, they may be taken as real effects even though interactions are actually present. 

2. Sometimes, the case п = 1 is encountered when the observations Y;; are proportions. For 
instance, the data may consist of the proportion of employees in a plant absent during the past week, 
with the plants classified by size and geographic area. As noted earlier, the arcsine transformation can 
be used for such data to stabilize the error variance. The transformed data then can be analyzed using 
no-interaction model (20.1), provided that each proportion is based on roughly the same number of ` 


886 PartFive Multi-Factor Studies 


cases. If the number of cases differs greatly, weighted least squares or logistic regression Should be 
utilized. 


20.2 ‘Tukey Test for Additivity 


We describe now the Tukey test that may be used for examining, when п = 1, whether ор 
not the two factors in a two-factor study interact. This test is also useful for a variety of 
experimental designs to be discussed in later chapters. 


Development of Test Statistic 


As noted in Section 20.1, we considered no-interaction model (20.1) when n — 1 to enable 
us to obtain an estimate of the error variance in this case. It would have been possible, 
however, to impose less severe restrictions on the interaction effects (Ө); and include the 
restricted interaction effects in the ANOVA model. Suppose we assume that: 


(aB);; = Doi; (20.6) 


where D is some constant, One motivation for this restriction is that if (08); is any second- 
degree polynomial function of œ, and fj, then it must be of the form (20.6) because of the 
restrictions on the оз, B;. and (е8); in (19.23) that the sums over each subscript be zero. 

Using (20.6) in a regular two-factor ANOVA model with interactions for the case n = 1, 
we obtain: 


К Ү;; =u. +0; + В; + Da; В; + Et, (20.7) 


where each term has the usual meaning. Remember there is no third subscript here because 
п = 1. The interaction sum of squares УУ 02078: now needs to be obtained. Assuming 
the other parameters are known, the least squares and maximum likelihood estimator of D 
turns out to be: 
A ; e; Yi; 
: D= 242; оар; (20.8) 
23i от 2 В; 

The usual estimator of œ; is Y;. — Y.. and that of В; is Y; — Y.. Replacing the parameters in 
D by these estimators, we obtain: 


3 Xn Y); Е Y.)Y; 
О. = Ya (7, s Y.y 
The sample counterpart of the interaction sum of squares УУ D^o7 £; will be denoted by 
SSAB* to remind us that this interaction sum of squares is for the special form of interaction in 


model (20.7). Substituting the sample estimates into5 ^) ^ О?о? 8]. we obtain the interaction 
sum of squares: 


b- 


: (20.82) 


SSAB* = УУ, P. -Yy(,;-Y» 
ij 


_ 5250. Ә-ә] 


a 1.2 v 3-25 (20.9) 
dah. — Y. Dk; Yy 


es 


Фет, б жел 


Example 


Chapter 20  Two-Factor Studies—One Case per Treatment 887 


The analysis of variance decomposition for the special interaction model (20.7) there- 
fore is: 


SSTO = SSA + SSB + SSAB* + SSRem* (20.10) 
where SSRem* is the remainder sum of squares: 
SSRem* = SSTO — SSA — SSB — SSAB* (20.10a) 


It can be shown that if D = 0—that is, if no interactions of the type Do, f, exist— 
SSAB* and SSRem* are independently distributed as chi-square random variables with 1 
and ab — a — b degrees of freedom, respectively. Hence, if D = 0, the test statistic: 


SSAB* SSRem* 
F* = —— + —— 20.11 
1 ab—-a—b ( ) 
is distributed as F(1, ab — a — b). 
Thus, for testing: 
Ho: D=0 (no interactions present) 


. . (20.12a) 
Ha: D#0 (interactions Da; p; present) 


we use test statistic F* defined in (20.11). Large values of F* lead to conclusion H,. The 
appropriate decision rule for controlling the risk of a Type I error at o is: 
If F* € F(1—o;1, ab — a — b), conclude Ho 


(20.12b) 
If F* > F(1 — о; 1, ab — a — b), conclude H, 


The power of this test has been studied, and it appears that if interactions of approximately 
the type postulated in (20.6) are present and the factor A and factor B main effects are large, 
the test is effective in detecting the interactions. The test is usually called the Tukey one 
degree of freedom test. This test also may be used for testing for the presence of general 
interactions. 


We shall conduct the Tukey test for the insurance premium example. The data are presented 
in Table 20.2a. First, we obtain the-elements of SSAB*: 


УУ. -EE – Ё) = (120 — 175)(190 — 175)(140) +... 
+ (210 — 175)(160 — 175)(200) = —13,500 
УМ. - 2) = SSA _ 9,300 _ 4680 


FEED 
€ pots SSB 1,350 
Y; -Y.y = —— = ——— = 450 
ax J ) 3 3 
Hence, the special interaction sum of squares is: 
_ (= 13,500)” " 


SSAB* 87.1 


^ 4,650(450) _ 


888 PartFive Multi-Factor Studies 


Using the ANOVA sums of squares in Table 20.2b, we have by (20.102): 


SSReu* = 10,750 — 9,300 — 1,350 — 87.1 = 12.9 
Finally, we obtain the test statistic by (20.11): 
87.1 12.9 
F* = => =. 
I 3(2)—3-2 


For a = .10, we require F(.90; 1, 1) = 39.9. Since F* = 6.8 < 39.9, we Conclude- 
that region and size of city do not interact. The P-value of this test is .23. Use of the 
no-interaction model for the data in Table 20.2a therefore appears to be reasonable. 


6.8 


Remedial Actions if interaction Effects Are Present 


When the Tukey test indicates the presence of interaction effects in an analysis of variance 
application where n = 1, efforts should be made to remove the interactions so that the 
analysis described in Section 20.1 can be utilized. As we described in Chapter 19, trans- 
formations of the data can often be used to remove interaction effects or to make them 
unimportant. 

One possibility is to try simple transformations of the response variable, such as a 
square root or a logarithmic transformation. Another possibility is to search in the family 
of power transformations on Y described in Chapter 3 in connection with the Box-Cox 
transformations. The procedure is to make transformations on Y according to (3.36) for 
selected values of A. For each value of 4, the Tukey test statistic (20.11) is then obtained. 
If a А value leads to a nonsignificant F* test statistic, a transformation will then have 
been found that removes the interaction effect. Frequently, a range of A values will yield 
nonsignificant test statistics, in which case a simple А value in this range, such as A = .5, 
may be chosen. 

If no transformation can be found to make the interactions unimportant, an approximate 
method of analysis can be employed: see, for instance. Reference 20.1. 


Comment 


If one or both factors are quantitative. a test for interactions effects can be obtained by regression 
methods. For example. consider a study in which both factors are quantitative. each has three levels, 
and à = I so that ny = 9. Let X;;, denote the value of the first factor for the treatment for which 
factor A is at the ith level and factor B is at the jth level. X;j» is defined similarly for the second 


factor. Second-order regression model (8.7) may then be used: 
a 


Ү; = Pa + Ваха + Вә + Pax; + Bex + Вехи + €ij 
where: 
Va Xia — Xi 
Хз = Хо — Х› 


With this model, there would be ay — p = 9 — 6 = 3 degrees of freedom for estimating the error 
variance c^. and the test for the presence of an interaction effect would be the usual test in (6.51) for 
testing whether £5 = 0. 

Still other tests for interactions could be conducted since additional cross-product terms could be 
incorporated into the regression model. However, this would not be desirable here since the number 
of degrees of freedom available for estimating the error variance с? is already very small. a 


КЕЛЧИ 


E 


Chapter 20  Two-Factor Studies—One Case per Treatment 889 


v^ 


' ed 20.1. Johnson, D. E., and F. A. Graybill. ^Estimation of o? in a Two-Way Classification Model with 
erence Interaction,” Journal of the American Statistical Association 67 (1972), pp. 388-94. 


oblems 20.1. 


320.2. 


x20.3. 


*20.4. 


20.5. 


Suppose that two-factor analysis of variance model (19.23) were to be employed with n — 1 
for each factor level combination. How many degrees of freedom would be associated with 
SSE in (19.37c)? What does this imply? 

Coin-operated terminals. A university computer service conducted an experiment in which 
one coin-operated computer graphics terminal was placed at each of four different locations 
on the campus last semester during the midterm week and again during the final week of 
classes. The data that follow show the number of hours each terminal was not in use during 
the week at the four locations (factor A) and for the two different weeks (factor B). 


Factor B (week) 


Factor A j=1 j=2 
(location) Midterm Final 
i=1 16.5 21.4 
i=2 11.8 17.3 
і= 3 12.3 16.9 
i=4 16.6 21.0 


Assume that no-interaction ANOVA model (20.1) is appropriate. 


a. Plot the data in the format of Figure 20.1. Does it appear that interaction effects are present? 
Does it appear that factor A and factor B main effects are present? Discuss. 


b. Conduct separate tests for location and week main effects. In each test, use level of sig- 
nificance a = .05 and state the alternatives, decision rule, and conclusion. Give an upper 
bound for the family level of significance; use the Kimball inequality (19.53). What is the 
P-value for each test? 

c. Make all pairwise comparisons among location means and estimate the difference between 
the means for the two weeks; use the Bonferroni procedure with a 90 percent family 
confidence coefficient. State your findings. 

Refer to Coin-operated terminals Problem 20.2. It is desired to estimate u32- 

a. Obtain a point estimate of 443» using (20.5). 

b. Obtain the estimated variance of 3› by fitting the equivalent regression model. 

c. Construct a 95 percent confidence interval for u32. Interpret your interval estimate. Is your 
interval estimate applicable if next year two graphics terminals will be placed at location 
3? Explain. 

Refer to Coin-operated terminals Problem 20.2. Conduct the Tukey test for additivity; use 

о = .025. State the alternatives, decision rule, and conclusion. If the additive model is not 

appropriate, what might you do? 

Brainstorming. А researcher investigated whether brainstorming is more effective for larger 

groups than for smaller ones by setting up four groups of agribusiness executives, the group 

sizes being two, three, four, and five, respectively. He also set up four groups of agribusiness 
scientists, the group sizes being the same as for the agribusiness executives. The researcher 
gave each group the same problem: “Ноу can Canada increase the value of its agricultural 


890 PartFive Multi-Factor Studies 


20.7. 


exports?” Each group was allowed 30 minutes to generate ideas. The variable of interest 
the number of different idcas proposed by the group. The results. classified by type of 2 
(factor A) and size of group (factor B). were: ор 


Factor В (size of group) 


Factor A j^21 j22 j=3 =4 

(type of group) Two Three Four Five 
i=] Agribusiness executives 18 22 31 32 
i=2 Agribusiness scientists 15 23 29 33 


Assume that no-interaction ANOVA model (20.1) is appropriate. 


a. Plot the data in the format of Figure 20.1. Does it appear that interaction effects are present? 
Does it appear that factor A and factor B main effects are present? Discuss. 

b. Conduct separate tests for type of group and size of group main effects. In each test, use 
level of significance œ = .01 and state the alternatives, decision rule, and conclusion. Give 
an upper bound for the family level of significance; use the Kimball inequality (19.53), 
What is the P-value for each test? 

c. Obtain confidence intervals for D, = ju.» — py, D» = р.з — H-2, and D3 = дш = pa 
use the Bonferroni procedure with a 95 percent family confidence coefficient. State your 
findings. 

d. Is the Bonferroni procedure used in part (c) the most efficient one here? Explain. 


. Refer to Brainstorming Problem 20.5. 1t is desired to estimate рц. 


a. Obtain a point estimate of уу using (20.5). 

b. Obtain the estimated variance of й. by fitting the equivalent regression model. 

c. Construct a 99 percent confidence interval for 4,4. Interpret your interval estimate. 15 your 
interval estimate applicable if the two factors interact? 


Refer to Brainstorming Problem 20.5. Conduct the Tukey test for additivity: use о = .01. 
State the alternatives, decision rule, and conclusion. IF the additive model is not appropriate, 
what might you do? 


. Soybean sausage. A food technologist, testing storage capabilities for a newly developed 


type of imitation sausage made from soybeans, conducted an experiment to test the effects 
of humidity level (factor A) and temperature level (factor B) in the freezer compartment on 
color change in the sausage. Three humidity levels and four temperature levels were consid- 
ered. Five hundred sausages were stored at each of the 12 humidity-temperature combinations 
for 90 days. At the end of the storage period, the researcher determined the proportion of 
sausages for each humidity-temperature combination that exhibited color changes. The re- 
searcher transformed the data by means of the arcsine transformation (18.24) to stabilize the 
variances. The transformed data У' = 2 arcsin VY follow. 


Factor A Factor В (temperature level) 
(humidity level) j^1 j=2 j=3 j=4 
i=1 13.9 14.2 20.5 24.8 
i=2 15.7 16.3 21.7 23.6 


i=3 15.1 15.4 19.9 26.1 


20.9. 


20.10. 


Chapter 20  Two-Factor Studies—One Case per Treatment 891 

Assume that no-interáction ANOVA model (20.1) is appropriate. 

a. Plotthe datain the format of Figure 20.1. Does it appear that interaction effects are present? 
Does it appear that factor А and factor B main effects are present? Discuss. 

b. Conduct separate tests for humidity and temperature main effects. In each test, use level 
ofsignificance о = .025 and state the alternatives, decision rule, and conclusion. What 
is the P-value for each test? 

c. Obtain confidence intervals for Ру = шэ — wy, Р» = из — шэ, and Рз = ид — qas; 
use the Bonferroni procedure with a 95 percent family confidence coefficient. State your 
findings. 

d. Is the Bonferroni procedure used in part (c) the most efficient one here? Explain. 

Refer to Soybean sausage Problem 20.8. It is desired to estimate 423. 

a. Obtain a point estimate of u23 using (20.5). 

b. Obtain the estimated variance of эз by fitting the equivalent regression model. 

c. Construct a 98 percent confidence interval for u23 and transform it back to the original 
units. Interpret your interval estimate. Is your interval estimate applicable if the two factors 
interact? 

Refer to Soybean sausage Problem 20.8. Conduct the Tukey test for additivity; use a = .005. 

State the alternatives, decision rule, and conclusion. If the additive model is not appropriate, 

what might you do? 


Exercises 20.11. 
20.12. 


Case 20.13. 
Study 


Modify formulas (19.392) and (19.39b) to apply to ANOVA model (20.1), where n — 1. 


Show that (20.7) is the only second-degree polynomial function of o; and В; such that 
> @B)ij =); P) =0. 


Refer to Soybean sausage Problem 20.8. Assume that the humidity levels and temperature 
levels employed are equally spaced—that is, actual humidity increases linearly with i, and 
actual temperature increases linearly with j so that ; and j are coded levels of humidity and 
temperature. Use techniques discussed in Chapter 8 to develop a polynomial regression model 
to predict the transformed percentage of sausages exhibiting color change as a function of 
coded humidity and temperature levels. Your model should consider, at most, second-order 
terms in coded humidity level, and third-order terms in coded temperature level. What does 
your model suggest concerning the presence or absence of interactions? Use appropriate 
graphics to summarize your fitted regression model. 


Chapter , 


handomized Complete 


block Designs 


In Chapter 15, we introduced the concept of blocking. We noted there that when the available 
experimental units are not homogeneous, grouping the experimental units into blocks of 
homogeneous units will reduce the experimental error variance and also increase the range 
of validity for inferences about the treatment effects. 

In this chapter, we shall take up the design and analysis of randomized complete block 
experiments in detail. We discuss when and how to conduct a randomized complete block 
design, the analysis of a randomized complete block design, and planning of sample sizes 
for blocked experiments. 

For complete block designs, each block consists of one complete replication of the set 
of treatments. When the number of experimental units available in a block is less than the 
number of treatments, incomplete block designs may at times be useful. We shall consider 
incomplete block designs in Chapters 28 and 29. 


21.1 Elements of Randomized Complete Block Designs 


Description of Designs 


892 


In a randomized complete block design, the experimental units are first sorted into homoge- 
neous groups, called blocks, and all treatment combinations are then assigned at random {0 
experimental units within the blocks. Note that this requires a series of separate, restricted 
randomizations—one for each block. In effect, separate experiments are conducted within 
each block, which leads to greater homogeneity of experimental units, reduced experimental 
error, and more precise estimates of treatment effects. We illustrate the use of randomized 
block designs by considering three examples. 


1. In an experiment on the effects of four levels of newspaper advertising saturation Of 
sales volume, the experimental unit is a city, and 16 cities are available for the study. Size 
of city usually is highly correlated with the response variable, sales volume. Hence, Um 
desirable to block the 16 cities into four groups of four cities each, according to population 
size. Thus, the four largest cities will constitute block 1, and so on. Within each block, the 
four treatments are then assigned at random to the four cities, and the four randomizalions, 
one for each block, are conducted independently. 


s y Chapter 21 Randomized Complete Block Designs 893 


E 


2. Inan experiment on the effects of three different incentive pay schemes on employee 
productivity of electronic assemblies, the experimental unit is an employee, and 30 em- 
ployees are available for the study. Since productivity here is highly correlated with manual 
dexterity, it is desirable to block the 30 employees into 10 groups of three according to their 

P manual dexterity. Thus, the three employees with the highest manual dexterity ratings are 
` grouped into one block, and so on for the other employees. Within each block, the three 
incentive pay schemes are then assigned randomly to the three employees. 

3. А chemist is studying the reaction rate of five chemical agents. Only five agents can 
be analyzed effectively per day. Since day-to-day differences may affect the reaction rate, 
each day is used as a block, and all five chemical agents are tested each day in independently 

6 randomized orders. 


IDE 


з As these examples imply, the key objective in blocking the experimental units is to make 
them as homogeneous as possible within blocks with respect to the response variable under 
study, and to make the different blocks as heterogeneous as possible with respect to the 
response variable. As noted earlier, the design in which each treatment is included once in 
each block is called a randomized complete block design. Often, we shall drop the term 
"complete" because the context makes it clear that all treatments are included in each block. 


Comments 

1. In a complete block design, each block constitutes a replication of the experiment. For that 
reason, it is highly desirable that the experimental units within a block be processed together whenever 
this will help to reduce experimental error variability. As an example, an experimenter may tend to 

E make changes in experimental techniques over time (e.g., in the administration of the experiment to 
subjects) without being aware of it. Consecutive processing of the experimental units block by block 
will tend to exclude such sources of variation from the within-blocks analysis and thereby make the 
experimental results more precise. 

2. In factorial experiments, some of the factors of interest may be characteristics of the experi- 
mental units, such as gender, age, and amount of experience on the job. Even though these factors 
are not introduced to reduce experimental error variability but rather are included for their intrinsic 
interest, we shall nevertheless consider such experiments to be randomized block designs because the 
randomization of the experimental factors to the experimental units is restricted by the nature of the 
observational factors considered. [| 


Criteria for Blocking 

As noted earlier, the purpose of blocking is to sort experimental units into groups within each 
of which the elements are homogeneous with respect to the response variable, such that the 
differences between groups are as great as possible. To help recognize some of the character- 
istics of experimental units that are useful criteria for blocking, we need a precise definition 
of an experimental unit. Any aspect of the experimental setting that changes from treatment 
application to treatment application—excluding the treatment changes themselves—is a 
characteristic of the experimental unit. For example, suppose the treatment in a taste-testing 
experiment consists of a vegetable containing a particular additive. The experimental unit 
might then be defined as a homemaker of a given age, evaluated by a given observer on a 
specified day at a particular time, and served food from a given batch of cooked vegetable. 
Still other elements of the experimental setting might be included in the definition of the 
experimental unit, and should be if they contribute to variability in the responses. 


894 PartFive Multi-Factor Studies 


A full definition of the experimental unit such as the one just given sug 


gests two 
of blocking criteria: types 


I. Characteristics associated with the unit—for persons: gender, age, income, intel}; 
education, job experience, attitudes, etc.; for geographic areas: popul 
income, etc. 

2. Characteristics associated with the experimental setting—observer, time of Processing 
machine, batch of material, measuring instrument, etc. ° 


" 1 Bence, 
ation 517е, average 


Use of time as a blocking variable frequently captures a number of different sources of 
variability, such as learning by observer, changes in equipment, and drifts in environmental 
conditions (e.g., weather). Blocking by observers often eliminates a substantial amount 
of interobserver variability; similarly, blocking by batches of material frequently is very 
effective. There is no need to use only a single blocking criterion: several may be employed 
if the experimental error can be further reduced by doing so. 

The design of an effective randomized block experiment requires the ability to anticipate 
potential sources of variation—the blocking variables—in advance of experimentation, 
These variables are then held constant within blocks as the experiment is conducted in order 
to reduce the experimental error variability. Often, past experience in the subject matter field 
enables the experimenter to select good blocking variables. If some experiments have been 
run in the past in which blocking has been employed, these results can be analyzed to 
determine the effectiveness of the blocking variables. [In the absence of any information 
on the effectiveness of potential blocking variables, uniformity trials can be run where all 
experimental units are assigned the same treatment. From these trials, information can be 
obtained on the effectiveness of different blocking variables. 


Comment 


As noted in Chapter 15, when subjects are used as a blocking variable, the resulting designis sometimes 
called a repeated measures design. Since these designs involve some special problems, we will discuss 
them separately in Chapter 27. a 


Advantages and Disadvantages 
The advantages of a randomized complete block design are: 


1. It can. with effective grouping, provide substantially more precise results than a com- 
pletely randomized design of comparable size. * 
It can accommodate any number of treatments and replications. 

3. Different treatments need not have equal sample sizes. For instance, if the control is to 
have twice as large a sample size as each of three treatments, blocks of size five would 
be used; three units in a block are then assigned at random to the three treatments and 
two to the control. 

4. The statistical analysis is relatively simple. 

5. [f an entire treatment or a block needs to be dropped from the analysis for some reason, 
such as spoiled results, the analysis is not complicated thereby. 

6. Variability in experimental units can be deliberately introduced to widen the range of 
validity of the experimental results without sacrificing the precision of the results. 


кә 


Chapter 21 Randomized Complete Block Designs 895 


Disadvantages include: 


y 


TÓ 


If observations are missing within a block, a more complex analysis is required. 

2. The degrees of freedom for experimental error are not as large as with a completely 
randomized design. One degree of freedom is lost for each block after the first. 

3. More assumptions are required for the model (e.g., no interactions between treatments 
and blocks, constant variance from block to block) than for a completely randomized 
design model. 

4. Because the blocking variable is an observational factor and not an experimental factor, 
cause-and-effect inferences concerning the relationship between the blocking variable 
and the response variable is problematic. This is not a serious disadvantage, because 
investigators usually are not concerned with estimating or making inference about block 
effects. Blocking is primarily a device for reducing experimental variation and thereby 
increasing the precision of the estimates of the treatment effects. 


How to Randomize 


Illustration 


FIGURE 21.1 
Layout for 
Randomized 
Complete 
Block 

Design —Risk 
Premium 
Example. 


The randomization procedure for a randomized block design is straightforward. Within 
each block a random permutation is used to assign treatments to experimental units, just as 
in a completely randomized design. Independent permutations are selected for the several 
blocks. 


In an experiment on decision making, executives were exposed to one of three methods of 
quantifying the maximum risk premium they would be willing to pay to avoid uncertainty 
in a business decision. The three methods are the utility method, tbe worry method, and the 
comparison method. After using the assigned method, the subjects were asked to state their 
degree of confidence in the method of quantifying the risk premium on a scale from 0 (no 
confidence) to 20 (highest confidence). 

Fifteen subjects were used in the study. They were grouped into five blocks of three 
executives, according to age. Block 1 contained the three oldest executives, and so on. 
The design layout, after five independent random permutations of three were employed, is 
shown in Figure 21.1. Table 21.1 contains the results of the experiment, and Figure 21.2 


Experimental Unit 


1 2 3 

Block 1 (oldest executives) 
2 С U Ww 
3 U w C 


C : Comparison method 
W: Worry method 
U : Utility method 


5 (youngest executives) 


896 PartFive Multi-Factor Studies 


TABLE 21.1 
Data on 
Confidence 
Ratings 
(ratings on 
scale from 0 
to 20)—Risk 
Premium 
Example. 


FIGURE 21.2 
Plot of 
Confidence 
Ratings by 
Blocks—Risk 
Premium 
Example. 


Block 
i 
1 (oldest) 
2 


3 

4 

5 (youngest) 
‚ Average 


Confidence Rating 


presents graphically the confidence ratings for each method by block. It appears from Fig- 
ure 21.2 that there is much variation between blocks, but that in all blocks the comparison 
method has the highest confidence rating and the utility method the lowest. It also appears 
that there are no important interaction effects between blocks and treatments on the re- 
sponses; the response curves do not seem to deviate too much from being parallel. We 
discuss next a widely used model for randomized complete block designs and the anal- 
ysis of variance for this model before undertaking a formal analysis of the results in our 


example. 


Utility 


Worry 
Method 


Method (j) 
Worry Comparison 
3 8 
8 14 
9 16 
13 18 
14 17 
9.8 14.6 
Block 4 
Block 5 
Block 3 
Block 2 


Comparison 


Average 


Chapter 21 Randomized Complete Block Designs 897 


42 Model for Randomized Complete Block Designs 


Table 21.1 is similar in appearance to Table 20.2a, which shows the data for a two-factor 
study with one observation in each cell. In fact, a randomized complete block design may 
be viewed as corresponding to a two-factor study (blocks and treatments are the factors), 
with one observation in each cell. As we noted in Section 20.1, the assumption of no 
interactions between the two factors permits an analysis of factor effects when there is only 
one observation in each cell and the factors have fixed effects. 

The model for a randomized complete block design containing the assumption of no 
interaction effects, when both the block and treatment effects are fixed and there are ль 
blocks (replications) and r treatments, is as follows: 


Y; =H.. + pi t t + (21.1) 
where: 


р... 15 a constant 

pi are constants for the block (row) effects, subject to the restriction Y^ p; = 0 
t; are constants for the treatment effects, subject to the restriction Y^ v; = 0 
£; are independent N (0, o?) 

РЕТ Е =1,...,г 


The responses Y;; with randomized block model (21.1) are independent and normally 
distributed, with mean: 


ЕЙ} = и. +o t tj (21.2a) 
and constant variance: 
c?(Y,] = o? (21.2b) 


Randomized block model (21.1) is identical to the two-factor, no-interaction model 
(20.1), except that we now use p; for the block effect, v; for the treatment effect, and n, to 
designate the total number of blocks. Note that Y;; here stands for the response for the jth 
treatment in the ith block. 


Comments 


1. When the experimental units are grouped according to specified categories, such as into partic- 
ular age groups, income groups, and order-of-processing groups, the block effects p; are usually con- 
sidered to be fixed. Sometimes the block effects are viewed as random. For instance, when observers 
or subjects are used as blocks, the particular observers or subjects in the study may be considered 
to be a sample from a population of observers or subjects. The case of random block effects will be 
taken up in Chapter 25. 

2. If the treatment effects are random, the only changes in model (21.1) are that the т; now 
represent independent normal variables with expectation zero and variance c2, and that the v; are 
independent of the £;;. Random treatment effects are also considered in Chapter 25. 

3. The additive model (21.1) implies that the expected values of observations in different blocks 
for the same treatment may differ (e.g., older executives may tend to have lower confidence ratings 
for any of the methods of quantifying the risk premium than younger executives), but the treatment 


898 PartFive  Multi-Factor Studies 


effects (e.g.. how much higher the confidence rating for one method is over that for another) аге th 
same for all blocks. We shall consider the possibility of interactions between blocks and treatments 
later in Section 21.7. К 


21.3 Analysis of Variauce and Tests 


Fitting of Randomized Complete Block Model 
The least squares and maximum likelihood estimators of the parameters in randomized block 
model (21.1) are obtained in the customary fashion and again are the same. Employing our 
usual notation, they are: 


Parameter Estimator 
H f= Y. (21.3a) 
о bi = Y. — Y. (21.3b) 
Tj tj = Y, E Y. (21.3c) 


The fitted values therefore are: 
£; = Y. + Y Y) (Y; Yo = Yi + Y. (21.4) 
and the residuals are: 


A es ЧО ee en Pee (21.5) 


Analysis of Variance 


The analysis of variance for a randomized complete block design is identical to that fora 
two-factor, no-interaction model with one observation per cell, as described in Section 20.1: 


SSBL =r X (X. – Y.» (21.6a) 


SSTR = ny S (Y; - Y.) (21.6b) 


J 
SSBLTR = У У (i; - Y. - Y; cX) = у у с, (21.60) 
i J i і 


Here, SSBL denotes the sum of squares for blocks, SSTR denotes, as usual, the treatment 
sum of squares, and SSBL.TR denotes the interaction sum of squares between blocks and 
treatments [note from (21.5) that this sum of squares here is the same as the sum of the 
squared residuals]; гт, is the total number of experimental units in the study. 

A summary of the analysis of variance, including the expected mean squares for fixed 
treatment effects, is given in Table 21.2. Note that since there are no interaction terms in the 
model, the expected mean squares contain only o^ ? and, as appropriate, the treatment or block 
main effects term, Also note from the E([MS] columns in Table 21.2 that the appropriate 
denominator in the F* test statistic for testing treatment effects is the interaction mean 
square, here denoted by MSBL.TR. This is the same as in Section 20.1 for the two-factor 


TABLE 21 2 
ANOVA Table 


for- 
Randomized 
Complete 
Block Design, 
Block Effects 


Example 


| Source: of = ae m 
Variation ‘SS EE ағ MS E{MS}. 
| 4 -— узу? 
Blocks: ^SSBL. | n. —4 MSBL ре = 
Bi- Tb 1B. m- 
Treatments .SSTR ope] MSTR 62+ пх EE Et 
| Error SSBLTR — (e=1)(r—1) MSBLTR — w^ —— 
| Total SSTO тыг: 1 


no-interaction model with n, — 1. Hence, to test for treatment effects: 


Fixed Treatment Effects 


Ho: all t; = 0 


Hz: not all т; equal zero 


we use the same test statistic: 
—« MSTR 
^ MSBL.TR 


and the decision rule for controlling the Type I error at o is: 
If F* < F[1— o; r — 1, (ny — 1)(r — 1)], conclude Ho 
If F* > F[1—05;r — 1, (ль — 1)(r — 1)], conclude H, 


Chapter 21 Randomized Complete Block Designs 899 


(21.7a) 


(21.7b) 


(21.70) 


Table 21.3 contains the analysis of variance for the risk premium example in Table 21.1. 
The calculations are straightforward and were carried out by a computer package. To test 


for treatment effects: 
Н: 7,—5—1,—0 


Ha: not all т; equal zero 


TABLE 21.3 ANOVA Table for Randomized Complete Block Design—Risk 
Premium Example of Table 21. 1. 


‘Source of Variation IM | sdf 
“Blocks 4 
"Methods for risk- Кыш specification: . 2 
‘Error 8 
"Total 14 


900 PartFive Multi-Factor Studies 


we use the results in Table 21.3: 


MSTR 101.4 
Е* = = = 33.9 
MSBL.TR 2.99 


For level of significance a = .01, we require F(.99; 2, 8) = 8.65. Since F* = 33.9 > 8.65, 
we conclude H,, that the mean confidence ratings for the three methods differ. The p- -value 
of the test is .0001. 


Comments 
|. Sometimes one may also wish to conduct a test for block effects: 


Ho: all o; = 0 


21.8 
H,: not all p; equal zero ( а) 


Usually, however, the treatments are of primary interest, and blocks are chiefly the means for Teducing 
the experimental error variability. Table 21.2 indicates that the test for fixed block effects uses the test 


statistic: 
. . MSBL 
= MSBL.TR (21.8b) 
For the risk premium example, this test statistic is: 
“== = = 14.3 
2.99 


For level of significance o = .01, we require F (.99; 4, 8) 2 7.01. Since F* = 14.3 > 7.01. weconclude 
that the mean confidence ratings (averaged over treatments) differ for the various blocks. 

Since blocks correspond to an observational factor, care needs to be used in interpreting the 
implications of block effects. In our risk premium example, for instance, the block effects might not 
be due to age, even though age was the grouping variable. Education could be the pivotal explanatory 
variable, the effect by age appearing if older executives have less formal education than younger ones. 

2. If only two treatments are investigated in a randomized complete block design, it can be shown 
that the F test for treatment effects based on test statistic (21.7b) is equivalent to the two-sided т test 
for paired observations based on test statistic (A.69). 

3. When the responses Y;; in a randomized complete block design are far from normally distributed 
and transformations of the data are not successful to meet the robustness properties of the standard 
inference procedures, a nonparametric test of treatment effects may be useful. The nonparametric 
rank F test introduced in Section 18.7 for single-factor studies is easily adapted for use in studies 
based on randomized complete block designs. The г observations in each block are ranked from 1 
to r in ascending order and the usual F* test in (21.76) for testing treatment effects іп a randomized 
block design is carried out, but now based on the ranked data. We use F} to denote ће F* test statistic 
when the test is based on the ranked data. 

The rank F test statistic is equivalent to the statistic for the Friedman test, a widely used nor- 
parametric rank procedure for testing the equality of treatment means in randomized complete block 
designs. The Friedman test is also based on the within-block ranks R;; of the data. The Friedman test 
statistic 15: 

SSTR + SSBL.TR 


Ху = SSTR + 
ny(r — 1) 


which can be reduced to (when no ties are present): 


XE R? | —3 | 
E zu t nyr(r +1) rt». ad ke 


Chapter 21 Randomized Complete Block Designs 901 


Instead of using the F distribution, the Friedman test approximates the distribution of X7. when Ho 
holds by the chi-square distribution with r — 1 degrees of freedom, provided that the number of blocks 
is not too small. The decision rule is therefore: 


If X} < x?(1 — o; r — 1), conclude Ho 
If X7. > x?(1 — o; r — 1), conclude Н, 
The rank F test statistic Е and the X?. test statistic are related as follows: 


A (ль — 1)X} 
R— 2 
пь(т — 1) — Xp [ | 


91.4 Evaluation of Appropriateness of Randomized Complete 
| Block Model 


The importance of examining the appropriateness of a statistical model for a given set of 
data has been mentioned many times. Since the techniques of examination are similar, we 
shall make only a few points of special relevance to randomized complete block designs 
here. 


Diagnostic Plots 


Some ofthe chief waysin which the data may not fitrandomized complete block model (21.1) 
are: 


1. Unequal error variability by blocks 

2. Unequal error variability by treatments 
3. Time effects 

4. Block-treatment interactions 


Use of residual plots in connection with points 2 and 3 has been considered in Section 18.1 
with reference to a completely randomized design. The discussion there applies also to the 
residuals of a randomized complete block design, given in (21.5): 


We simply add here that if treatments do have unequal error variability in a randomized 
complete block design, the differences between any two treatments can always be estimated 
by working with the differences between the paired observations, Y;; — Y;j, which are 
unaffected by any unequal treatment variances. 

Unequal error variability by blocks can be studied by aligned residual dot plots for 
each block, as shown in Figure 21.3 for a randomized block study with 10 treatments run 
in three blocks. The residual dot plots in Figure 21.3 are suggestive of increasing error 
variability with increasing block number. If, for instance, the blocks were processed in 
block number order, some modifications in procedures may have taken place leading to 
larger experimental exror variability over time. Tests concerning the equality of variances, 
such as those described in Section 18.2, may be employed for a more formal determination, 
provided that the sample sizes are reasonably large so that the residuals can be treated as if 
they were independent. 


902 PartFive Multi-Factor Studies 


FIGURE 21.3 
Residual Dot 
Plots 
Suggesting 
Unequal Error 
Variances by 
Blocks. 


FIGURE 21.4 
Residual Dot 
Plots 
Suggesting 
Block- 
Treatinent 
Interactions. 


Example 


SN 
*999999 
> —1— —1— Block 1 
—10 0 10 
. 2 Фоо *. ee ee 
————T — —1— n € Block 2 
—10 0 10 
*. ee *. [2 [2 . Ф Ф о 
es —1— —L— Block 3 
—10 0 10 
Residual 
. о 
———— —1— —1— Block 1 
—10 0 10 
. o 
— — —1— m Block 2 
—10 0 10 
o . 
——————————————————rF —— — Block 3 
—10 0 10 
o Treatment 1: 
° . * Treatment 2 
Block 4 
-10 0 10 
Residual 


Interactions betweeri treatments and blocks are somewhat more difficult to detect from 
residual plots. Figure 21.4 contains the residuals for an experiment with two treatments run 
in four blocks, The reversal in pattern of the residuals is suggestive of an interaction effect. 
There are, however, many other possible types of interaction patterns that would appear 
very much different from that in Figure 21.4. 

Another diagnostic plot that may be helpful to detect interaction effects is a plot of the 
residuals еу against the fitted values Î;;. A curvilinear pattern of the residuals in such a plot 
often suggests the presence of interaction effects between blocks and treatments. This plot 
also provides information about the constancy of the error variance. 

Still another diagnostic plot for interactions, which is often more effective than а residual 
plot, is a plot of the responses Y;; by blocks. Figure 21.2 illustrates this type of plot. A severe 
lack of parallelism in such a plot is a strong indication that blocks and treatments interact 
in their effects on the response. 


We already noted that the plot of responses by block in Figure 21.2 for the risk premium 
example does not exhibit a severe lack of parallelism, thus suggesting that blocks and 
treatments do not interact in any major fashion. Figure 21.5a, which presents a plot of the 
residuals against the fitted values, leads to a similar conclusion. There is no strong evidence 


Chapter 21 Randomized Complete Block Designs 903 


: FIGURE 21.5 Diagnostic Residual Plots—Risk Premium Example. 


Residual 


(a) Residual Plot against Ў (b) Normal Probability Plot 


Residual 


-24 -12 00 12 24 
Expvalue 


of a curvilinear pattern here. In addition, Figure 21.5a does not indicate the existence of 
substantially unequal error variances. 

Figure 21.5b contains a normal probability plot of the residuals. This plot does not 
suggest any strong departures from a normal error distribution. The coefficient of correlation 
between the ordered residuals and their expected values under normality is .959 and supports 
this conclusion. Residual dot plots for each treatment and for each block were also prepared 
(they are not shown here). They suggested that the error variances did not differ substantially 
between treatments and between blocks. These results, in addition to a formal test that found 
no interactions between block and treatment effects (to be discussed next), led the analyst 
to conclude that randomized block model (21.1) is appropriate for the data. 


Tukey Test for Additivity 


Example 


The Tukey test for additivity, discussed in Section 20.2, may be employed for a formal test 
of interaction effects between blocks and treatments for a randomized block design. The 
special interaction sum of squares in (20.9) will be denoted here by SSBL.TR*. 


To test for interaction effects between blocks and treatments in the risk premium example, 
we calculate the special interaction sum of squares in (20.9) as follows, using the data in 
Tables 21.1 and 21.3: ; 


УУХ. - x) - LY; = 24.80 


=- 2G SSBL 1713 
XE- -xy2——c- —3 = 57.10 
r 


= = SST 202.8 

УХ; -ry- poc. Mem AS 
d np 5 

Hence: 

(24.80)? 


SSBL.TR* = —————— = 
57.10(40.56) 


27 


904 PartFive  Multi-Factor Studies 


Using the results from Table 21.3, we can now obtain the remainder sum of Squares 
for the special interaction model (20.7): 


55Кет* = SSTO — SSBL — SSTR — SSBL.TR* 
= 398.0 — 171.3 — 202.8 ~ .27 
— 23.63 


(20.10a) 


Hence, test statistic (20.11) is: 

SSBL.TR* |  SSReim* 

1 `оюкп—к—пь 
.27 23.63 
= : = .08 
1 7 

For level of significance о = .05, we need F(.95; 1,7) = 5.59. Since F* = .08 < 5.59, we 
conclude that no block-treatment interaction effects are present. The P-value of this test 
is .79. 


Comment 


When interaction effects are present. transformations of the data should be attempted to remove at 
least the important interaction effects. The discussion in Section 20.2 is relevant to this point. = WN 


21.5 Analvsis of Treatment Effects 


Once the existence of fixed treatment effects has been established through the analysis of 
variance, the analysis of these effects proceeds as described in Chapter 17 for single-factor 
studies. Often, a useful preliminary view of the treatment effects can be obtained from a 
bar-interval plot of the estimated treatment means 15: The formal analysis of the treatment 
effects usually involves estimation of one or more contrasts of the treatment means p.j, 
where j..; is the mean response for treatment j averaged over all blocks. The formulas in 
Chapter 17 for estimating contrasts of the treatment means apply here, with the treatment 
means now denoted by u., and the estimated treatment means by Y. The appropriate mean 
square term to be used in the estimated variance of the contrast is MSBL.TR, obtained from 
(21.6c), since it is the denominator of the F* statistic for testing fixed treatment effects. The 
multiples for the estimated standard deviation of the contrast are now as follows: 


Single comparison t[1 — 0/2; (пь = 1)(r — 1)] „(21.9а) 
1 
Tukey procedure (for T = —gll — eir, (n, — DO -1) (21.9b) 
pairwise comparisons) v2 
Scheffé procedure 52 = (к—1)Е[1—о;г 1, (пь = 007 = D] (2199 
Bonferroni procedure В = t[1 — 0/28; (ny — Dt — 1)] (21.9d) 
Example The researcher who conducted the risk premium study was satisfied, on the basis of the 


residual analyses and tests, that randomized complete block model (21.1) is appropriate for 
the experiment. To analyze the treatment effects formally, the researcher wished to obtain all 


21.6 Use 


Chapter 21 Randomized Complete Block Designs 905 


pairwise comparisons with a 95 percent family confidence coefficient, utilizing the Tukey 
procedure. Using (17.30b), with MSE replaced by MSBL.TR and the results in Table 21.3, 
we obtain: 


1.20 


Д 1 1N 2MSBLTR 20.99 
Ш = MSBL.TR ( T j= 2022 _ 


Np np np 5 


Remember that each estimated treatment mean Y; consists of n, observations (one from 
each of n; blocks). Using (21.9b), we find for a 95 percent family confidence coefficient: 
1 


t= 


1 
95; 3, 8) = — (4.04) = 2.86 
4( ) 5‹ ) 


Hence: 
Ts(E) = 2.86V/1.20 = 3.1 

We now obtain for the pairwise comparisons using (17.30) and Table 21.1 for the ¥;: 

1.7 = (14.6 — 9.8) – 3.1 < u.3 — per < (14.6 — 9.8) + 3.1 = 7.9 

5.9 = (14.6 — 5.6) — 3.1 € шз — ш. < (14.6 — 5.6) + 3.1 = 12.1 

1.1 = (9.8 — 5.6) —3.1 < u.2 — и. < (9.8 — 5.6) +3.1 = 7.3 
Here, м. is the mean confidence rating, averaged over all blocks, for the utility method, 
and р.з and u.z are the mean confidence ratings for the worry and comparison methods, 
respectively. 

We conclude, just as Figure 21.2 suggests, that the comparison method has a higher 

mean confidence rating than the worry method, which in turn has a higher mean confidence 


rating than the utility method. The family confidence coefficient of .95 applies to this entire 
set of comparisons. A line plot of the estimated treatment means summarizes the results: 


Utility Worry Comparison 
— eo L—————e |e 
7.5 12.5 
Confidence Rating 


of More than One Blocking Variable 


Sometimes, a substantial reduction in the experimental error variability can only be obtained 
by utilizing more than one variable for determining blocks. For instance, both age and gender 
might be needed for designating blocks: 


Block Characteristics of Experimental Units 
1 Male, aged 20-29 
2 Female, aged 20-29 
3 Male, aged 30-39 
4 Female, aged 30-39 


etc. etc. 


906 PartFive Multi-Factor Studies 


21.7 Use 


As another example, both observer and day of treatment application may be helpful as 
blocking variables: 


Block Characteristics of Experimental Units 
1 Observer 1, day 1 
2 Observer 2, day 1 
3 Observer 1, day 2 
4 Observer 2, day 2 


etc. etc. 


Unless the separate effects of each of the blocking variables need to be studied, no new 
problems arise when the blocks are defined by two or more variables. The n, blocks are 
simply treated as ordinary blocks, and the usual block sum of squares is calculated. 

When the effect of each of the blocking variables is to be isolated and the blocks are 
defined in a complete factorial fashion, the analysis simply treats each of the blocking 
variables as a factor and utilizes the methods developed in Chapter 19 for two-factor studies, 
For example, if twelve blocks are used when four observers and three days are employed 


for blocking, the analysis of variance would decompose the 12 — 1 — 8 degrees of freedom 
for blocks into 4 — 1 = 3 degrees of freedom for observer main effects, 3 — 1 = 2 degrees 


of freedom for day main effects, and 3 x 2 — 6 degrees of freedom for observer x day 
interactions. 

A problem that sometimes arises when two or more blocking variables are to be used is 
the large number of blocks called for. Suppose an experiment is to be conducted where the 
experimental units are stores. In order to reduce the experimental error variability to a rea- 
sonable level, it would be desirable to group the stores into six sales volume classes and also 
into six location classes (suburban shopping center, suburban other, etc.). Thirty-six blocks 
result from combining these two blocking variables. If six treatments were to be studied, 
216 stores would be required for the experiment. Often, use of this many stores would be 
much too costly. Latin square designs, to be discussed in Chapter 28, permit in this type of 
study the use of a much smaller number of experimental units while still preserving the full 
benefits of error variance reduction by using both blocking variables in six classes each. 


of More than One Replicate in Each Block 


When block effects are fixed, use of an additive model in the presence of interactions between 
blocks and treatments has the effect of reducing the power of the test and increasing the 
width of interval estimates of treatment effects, thus making the experiment less sensitive. 
In addition, there are occasions when the nature of the interactions between blocks and 
treatments is of interest. It is possible to use a design that permits an interaction term in the 
model even when the block effects are fixed, and that allows the nature of the interaction 
effects to be investigated. This design is called a generalized randomized block design. Itis 
the same as a randomized block design except that d experimental units are assigned to each 
treatment within a block. This generalized design increases the size of a block from г units 
for a randomized block design to dr units. The increase in block size will often have the 
effect of increasing experimental error variability when the total number of experimental 


pec gen: 
Example 


TABLE 21.4 
Data on 
Completion 
Times for 
Generalized 
Randomized 
Block Design 
With d = 4— 
Task 
Completion 
Example. 


Chapter 21 Randomized Complete Block Designs 907 


units is fixed. In the social sciences, however, increasing the size of the block moderately 
may cause little loss in efficiency. For instance, having one block of 10 persons aged 20—29 
instead of two blocks of five persons of ages 20—24 and 25—29, respectively, will for many 
types of experiments involve little loss of efficiency. 

As we shall demonstrate by an example, a generalized randomized block design is ana- 
lyzed like an ordinary two-factor study where blocks are one factor. Hence, no new problems 
are encountered with a generalized randomized block design in testing for treatment effects 
or in estimating them. In particular, we can now calculate MSE and use it as an estimator 
of the error variance o?. 


Table 21.4 contains the data for a single-factor experiment in which the effects of distraction 
level (factor A: low distraction, high distraction) on the time required to complete a task 
were studied, using eight men and eight women. Four men were assigned at random to each 
of the r — 2 treatments, and independently four women were assigned at random to each 
treatment. Here gender is the blocking variable. Each block contains eight persons, with four 
randomly assigned to each treatment within the block. The layout in Table 21.4 corresponds 
to the layout in Table 19.7 for a two-factor study; to stress the correspondence, we have 
placed the blocks in columns rather than in rows as usual. Since blocks and distraction 
levels are considered to be fixed, we utilize the fixed effects two-factor model (19.23), with 
notation modified to fit the present context: 


Yijk = и. + pi t; + (от) + Eijk (21.10) 
where: 
ш.. is a constant 


Pi» Tj, are constants subject to the restrictions $^ p; = Уут; = 0 


(от); are constants subject to the restrictions that the sums over any subscript are zero 
е; are independent N (0, o?) 
i= l,....n5;J =1,...,r;k = 1d 


We shall refer to model (21.10) as the generalized randomized block model. 


Block (gender) 
Male Female 
Low Distraction: 
12 3 
8 9 
7 5 
5 9 
High Distraction: 
14 11 
16 9 
15 10 


908° Раг Five — Multi-Factor Studies 


FIGURE 21.6 
Portion of SAS 
GLM ANOVA 
Output for 
Data in 

Table 21.4— 
Task 
Completion 
Example 

(n, = 2,г = 2, 
а = 4). 


Ѕоигсе | DF | Sum of Squares Mean Square F Value Pr» 
Model 3 | 150.0000000 50.0000000 8.33 0.0029 
Error 12 | 72.0000000 6.0000000 


Corrected Total 15 222.0000000 


R-Square Coeff Var Root MSE y Mean 


0.675676 24.49490 2.449490 10.00000 


Source 


| DF | Туре 1$$ Mean Square F Value Pr>F 
Distraction | 1 121.0000000 121.0000000 20.17 0.0007 
Gender 1 25.0000000 25.0000000 | 4.17 | 0.0639 
Dist*Gender 1 4.0000000 4.0000000 | 0.67 0.4301 


The analysis of variance for generalized randomized block model (21.10) is the ordinary 
two-factor ANOVA of Table 19.8, with slight modifications in notation. The SAS GLM 
procedure was employed to obtain Figure 21.6 for the data in Table 21.4. We know from 
Table 19.8 that all test statistics use MSE in the denominator. These F* statistics are shown 
in Figure 21.6. For o. = .01, we require F(.99; 1, 12) = 9.33 for each of the tests. It 
is evident from the results in Figure 21.6 (see also the P-values given there) that blocks 
(gender) do not interact with treatments (distraction level) and that high distraction level 
increases the time required to complete the task, compared to the low distraction level. 


21.8 Factorial Treatinents 


Randomized complete block designs can also be used when the treatments have a factorial 
structure. For example, Figure 21.7 displays the layout for a randomized block design for 
a two-factor study, where each factor has two levels. Because the number of treatments 1s 
r = ab = 4, the block size here is four. 

When factorial treatments are employed, the ANOVA model can be modified by showing 
the component factor effects in place of the treatment effect. For a two-factor study, we have: 


Yi = p... + Pi +0; + By + OB) д + Ei (21.11) 


where the terms in the model have the usual meaning and (j, К) corresponds to the treatment 
mean џ. з. In the analysis of variance, we proceed as always by decomposing the treatment 
sum of squares SSTR into sums of squares for the factor main effects and interactions. This 
is shown in Table 21.5 for a two-factor study, the factors having a and b levels, respectively. 
The decomposition is done in the usual fashion, as explained in Section 19.4, utilizing the 
relation in (19.39): 


SSTR = SSA + SSB + SSAB 


gyout for а 
‘two-factor 
study ina 
Complete 
Block Design. 


TABLE 21.5 
ANOVA Table 
for a Two- 
Fáctor Study in 
a Randomized 
Complete 

Block Design— 


Chapter 21 Randomized Complete Block Designs 909 
A Ay 
B B; B B, 
Block 1 Үт 1112 Yiz 1122 
2 You Үлә Yn 1222 
3 Үзу Ya Үә] 322 
Source of 
Variation SS df MS 
Blocks SSBL пь— 1 MSBL 
Treatments SSTR r—1 MSTR 
Factor A SSA а—1 MSA 
Factor B SSB b—1 MSB 
AB interactions SSAB (a — 1Yb— 1) MSAB 
Error 5581.78 (ng—1)r—1)  MSBLTR 
Total SSTO ny — 1 
Note: r = ab 


Formulas (19.39a, b, c) are still appropriate for calculating the component sums of squares, 
remembering that (i, j) subscripts are there used to identify the treatments in terms of 
the factor level combinations. Tests for factor effects are conducted as usual, and no new 
problems are encountered in the estimation of fixed factor effects. 


219 Planning Randomized Complete Block Experiments 


The planning of sample sizes for a randomized complete block design is very similar to that 
for a completely randomized design. The needed number of blocks n; may be determined 
either by specifying protection needed against making Type I and Type II errors or by 
specifying precision required for key contrasts of the treatment means. With either approach, 
it is necessary to assess in advance the magnitude of the experimental error variance o?. 


Power Approach 


Power of F Test. The power of the F test for treatment effects for a randomized com- 
plete block design involves the same noncentrality parameter as fora completely randomized 
design. Formula (16.88) gives the appropriate measure. Despite the same form of the non- 
centrality parameter, the two designs generally lead to different power levels even when 
based on the same sample sizes, for two reasons. First, the experimentalserror variance 
o? will differ for the two designs. Second, the degrees of freedom associated with the 
denominator of the F* statistic differ for the two designs. 


910 PartFive Multi-Factor Studies 


Example 


Use of Table B.12. As when planning the sample sizes for a completely randomized 
design, an easy way to implement the power approach for planning the sample sizes for 
а randomized complete block design is to use Table B.12. This table may be useg for 
planning randomized complete block designs provided that the number of treatments and 
blocks аге not very small, specifically provided that r(n; — 1) > 20. If this conditionis not 
met, Table В.11 must be used iteratively to implement the power approach. 


In the risk premium example, suppose that the number of blocks had not yet been determipeg 
and the experimenter desired the following risk protections: 


1. Type I error is to be controlled аго = .05. 

2. If any two treatment means differ by three or more rating points, i.e., if the minimum 
range of the treatment means is A — 3, the risk of concluding that there are no treatment 
effects should not exceed В = .20. 


The experimenter anticipates that the experimental error standard deviation when ex- 
ecutives are grouped by age will be approximately с = 2. Thus, the specifications can be 
summarized as follows: 


r=3 a= .05 A=3 
В = .20 ] — B —.80 с = 
Using (16.91) we find: 
= = 2 =1.5 
o 2 


Entering Table B.12 for power | — В = .80, r = 3, A/o = 1.5, and œ = .05, we find 
ny, = 10. Thus, the experimenter requires approximately 10 blocks of three executives each 
in order to obtain the desired protection against incorrect decisions. 


Estimation Approach 


a 


Example 


For planning the number of blocks пь by means of the estimation approach, we evaluate the 
anticipated standard deviations of key contrasts for different sample sizes until the desired 
precision is attained. Often, a multiple comparison procedure will be used for encompassing 
the different estimates under a family confidence coefficient. 


For the risk premium example, all pairwise comparisons are of interest. The desired width 
of the confidence intervals is +1.5. The Tukey procedure is to be used with a 95 percent 
family confidence coefficient. A planning value of o = 2 is reasonable. Using np = 10 as 
a starting point, the anticipated variance of any pairwise difference is: 


{1 = о? ‚э. oy( L4 8 
oO = g7 = » =. 
Np Mp 10 10 


or o {L} = .89. Further: 


І І | 
T = —41.95; r, DE- 1)] = 95:3, 18) = —=(3.61) = 2.55 
1195: r, Qn = De — DI = eal ›= 50362) 


Thus, the anticipated half-width of the confidence interval is Tø {Ê} = 2.55C89) = 2.3. 
Since this precision is not adequate, a larger number of blocks should be tried next. 


Chapter 21 Randomized Complete Block Designs 911 


Continuing this iterative process, we find that n; = 21 blocks are anticipated to meet 
the precision specification. 


Efficiency of Blocking Variable 

Once a randomized complete block experiment has been run, we often wish to estimate 
the efficiency of the blocking variable for guidance in future experimentation. This can be 
done readily. Let 02 stand for the experimental error variance for the randomized complete 
block design. Up to this point, we have used o? for this error variance, but now that we will 
compare two designs we need to be more specific. Let o? denote the experimental error 
variance for a completely randomized design. The relative efficiency of blocking, compared 
to a completely randomized design, is then defined as follows: 


(21.12) 


The measure E indicates how much larger the replications need be with a completely 
randomized design as compared to a randomized complete block design in order that the 
variance of any estimated treatment contrast be the same for the two designs. 

We know that for the randomized block design, MSBL.TR is an unbiased estimator of 
оў. The question is how to estimate o from the data for the randomized block design. 
Since the same experimental units are involved in either case and there are assumed to be 
no interactions between treatments and blocks, it can be shown that an unbiased estimator 


of o? is: 
— DM. — DMSBL.TR 
js (ль — I)MSBL + ny(r — ))MSBL (21.13) 
npr — 1 
Hence, we estimate E as follows: 
A 2 — DA. — DMSBL. 
g= $ _ (e DMSBL + mlr — 1)MSBL.TR (21.14) 
MSBL.TR (npr — 1)MSBL.TR 


Since the number of degrees of freedom for experimental error for a randomized block 
design is not as great as for a completely randomized design, E overstates the efficiency a 
little because it considers only the error variances. Several modified measures of efficiency 
have been suggested to take this overstatement into account. Unless the degrees of freedom 
for experimental error with both designs are very small, these modifications have little 
effect. One frequently used modification, applicable for assessing any design relative to 
another, is: 


p Gat Dd +3) g 
(0 @+зу@ +1) 


where df, denotes the degrees of freedom for the experimental error in the base design 
(completely randomized design, in our case) and df, denotes the degrees of freedom for the 
experimental error in the design whose efficiency is being assessed (randomized complete 
block design, in our case). j 


(21.15) 


912 PartFive Multi-Factor Studies 


Example 


We shall evaluate the efficiency of blocking by age of executives in the risk premi 
example. Placing the appropriate results from Table 21.3 in efficiency measure (21.13), уу 
obtain: Mis 
Е 4(42.8) + 5(2)(2.99) 
i 14(2.99) 


= 4.8 


Thus, we would have required almost five times as many replications per treatment with " 
completely randomized design to achieve the same variance of any estimated contrast as js 
obtained with blocking by age. Clearly, blocking by age was highly effective here, 
If we had used modified efficiency measure (21.14), we would have found: 
a,  (8+1)(12 +3) 
© (8-3)124- D 


This result does not differ greatly from that obtained by using (21.13). 


(4.8) = 4.5 


Comment 


The efficiency measure Ё in (21.13) equals | if MSBL = MSBL.TR; it is greater than | if MSBL > 
MSBL.TR; aud it is less than | if MSBL « MSBL.TR. Since the test statistic for block effects in (21.8b) 
is F* = MSBL/MSBL.TR, it follows that good blocking is achieved when this F* value exceeds | by 
a considerable margin. a 


Problems 


21.1. A student commented in a discussion group: “Random permutations are used to assign treat- 
ments to experimental units with a randomized block design just as with a completely ran- 
domized design. Hence, there is no basic difference between these two designs.” Comment. 

21.2. a. What might be some useful blocking variables for an experiment about the effects of 

different price levels on sales of a product, using stores as experimental units? 

b. What might be some useful blocking variables for an experiment about the effects of 
different flight crew schedules on the morale of crews, using flight crews as experimental 
units? 

c. What might be some useful blocking variables for an experiment about the effects of 
different drugs on the speed of a response to a stimulus, using laboratory animals as 
experimental units? 

21.3. Five treatments are studied in an experiment with a randomized complete block design using 
four blocks. Obtain randomized assignments of treatments to experimental units. 

21.4. Two treatments anda control are studied in an experiment with a randomized block design. Five 
blocks are employed, each containing four experimental units. In each block, each treatment is 
to be assigned to one experimental unit, and the control is to be assigned to two experimental 
units. Obtain randomized assignments of treatments to experimental units. 

*21.5. Auditor training. An accounting firm, prior to introducing in the firm widespread training 
in statistical sampling for auditing, tested three training methods: (1) study at home with 
programmed training materials, (2) training sessions at local offices conducted by local staff, 
and (3) training sessions in Chicago conducted by national staff. Thirty auditors were grouped 
into 10 blocks of three. according to time elapsed since college graduation, and the auditors 
in each block were randomly assigned to the three training methods. At the end of the tran 
ing, each auditor was asked to analyze a complex case involving statistical applications; 4 
proficiency measure based on this analysis was obtained for each auditor. The results were 


*21.6. 


21.7. 


Chapter 21 Randomized Complete Block Designs 913 


(block 1 consists of auditors graduated most recently, block 10 consists of those graduated 


most distantly): 
Block Training Method ( j) Block Training Method (j) 
i 1 2 3 i 1 2 3 
1 73 81 92 6 73 75 86 
2 76 78 89 7 68 72 88 
3 75 76 87 8 64 74 82 
4 74 77 90 9 65 73 81 
5 76 71 88 10 62 69 78 
a. Why do you think the blocking variable "time elapsed since college graduation" was 


employed? 

Obtain the residuals for randomized block model (21.1) and plot them against the fitted 
values. Also prepare a normal probability plot of the residuals. What are your findings? 
Plot the responses Y;, by blocks in the format of Figure 21.2. What does this plot suggest 
about the appropriateness of the no-interaction assumption here? 


Conduct the Tukey test for additivity of block and treatment effects; use a = .01. State the 
alternatives, decision rule, and conclusion. What is the P-value of the test? 


Refer to Auditor training Problem 21.5. Assume that randomized block model (21.1) is 
appropriate. 


а. 
b. 


Obtain the analysis of variance table. 

Prepare a bar graph of the estimated treatment means. Does it appear that the treatment 
means differ substantially here? 

Test whether or not the mean proficiency is the same for the three training methods. Use 
level of significance œ = .05. State the alternatives, decision rule, and conclusion. What 
is the P-value of the test? 

Make all pairwise comparisons between the training method means; use the Tukey proce- 
dure with a 90 percent family confidence coefficient. State your findings. 

Test whether or not blocking effects are present; use œ = .05. State the alternatives, decision 
rule, and conclusion. What is the P-value of the test? 


Fat in diets. A researcher studied the effects of three experimental diets with varying fat 
contents on the total lipid (fat) level in plasma. Total lipid level is a widely used predic- 
tor of coronary heart disease. Fifteen male subjects who were within 20 percent of their 
ideal body weight were grouped into five blocks according to age. Within each block, the 
three experimental diets were randomly assigned to the three subjects. Data on reduction in 
lipid level (in grams per liter) after the subjects were on the diet for a fixed period of time 
follow. 


Fat Content of Diet 


Block j=1 j=2 j=3 
i Extremely Low Fairly Low Moderately Low 
1 Ages 15-24 73 .67 15 
2 Ages 25-34 .86 75 .21 
3 Ages 35-44 -94 81 26 
4 Ages45-54 1.40 1.32 * 75 
5 Ages 55-64 1.62 1.41 .78 


914 Раг Еме Multi-Factor Studies 


21.8. 


21.9. 


a. Why do you think that age of subject was used as a blocking variable? 


b. Obtain the residuals for randomized block model (21.1) and plot them agai 
fitted values. Also prepare a normal probability plot of the residuals. What 
findings? 


nst the 
are your 


c. Plot the responses Y;; by blocks in the format оГ Figure 21.2. What does this plot sugges 
about the appr opiiaieness of the no-interaction assumption here? : 
d. Conduct the Tukey test for additivity of block and treatment effects; use о = 


А ey : : à -01. State the 
alternatives, decísion rule, and conclusion. What is the P-value of the test? 


Refer to Fat in diets Problem 21.7. Assume that randomized block model (21.1) is appropriate, 


a. Obtain the analysis of variance table. 

b. Prepare a bar-interval graph of the estimated treatment means, using 95 percent confidence 
intervals. Does it appear that the treatment means differ substantially here? 

c. Test whether or not the mean reductions in lipid level differ for the three diets; use g = .05. 
State the alternatives, decision rule, and conclusion. What is the P-value of the test? 

d. Estimate Lı = j£, — о and Ly = ро — р.з using the Bonferroni procedure with а 
95 percent family confidence coefficient. State your findings. 

e. Test whether or not blocking effects are present; use о = .05. State the alternatives, decision 
rule, and conclusion. What is the P-value of the test? 

f. A standard diet was not used in this experiment as a control. What justification do you 
think the experimenters might give for not having a control treatment here for comparative 
purposes? 

Dental pain. An anesthesiologist made a comparative study of the effects of acupuncture and 

codeine on postoperative dental pain in male subjects. The four treatments were: (1) placebo 

treatment—a sugar capsule and two inactive acupuncture points (A, Ві), (2) codeine treat- 
ment only—a codeine capsule and two inactive acupuncture points (A B4), (3) acupuncture 
treatment only—a sugar capsule and two active acupuncture points (A, B2), and (4) codeine 

and acupuncture treatment—a codeine capsule and two active acupuncture points (А2 B;). 

Thirty-two subjects were grouped into eight blocks of four according to an initial evaluation 

of their level of pain tolerance. The subjects in each block were then randomly assigned to 

the four treatments. Pain relief scores were obtained for all subjects two hours after dental 
treatment. Data were collected on a double-blind basis. The data on pain relief scores follow 

(the higher the pain relief score, the more effective the treatment). 


Treatment (j, k) 


Block 
i A; By А В AB; 4 А: Вә 
1 (Lowest) 0.0 5 6 1.2 
2 3 6 WA 1.3 
7 1.0 1.8 1.7 2.1 
8 (Highest) 1.2 1.7 1.6 2.4 


a. Why do you think that pain tolerance of the subjects was used as a blocking variable? 
b. Whiclt of the assumptions involved in randomized block model (21.11) are you most 
concerned with here? 


c. Obtain the residuals for randomized block model (21.11) and plot them against the fitted 
values. Also prepare a normal probability plot of the residuals. What are your findings? 


21.10. 


21.11. 


*2].12. 


*2].13. 


Chapter 21 Randomized Complete Block Designs 915 


d. Plot the responses Y; by blocks in the format of Figure 21.2, ignoring the factorial 
structure of the treatments. What does this plot suggest about the appropriateness of the 
no-interaction assumption here? 

€. Conduct the Tukey test for additivity of block and treatment effects, ignoring the factorial 
structure of the treatments; useo = .01. State the alternatives, decision rule, and conclusion. 
What is the P-value of the test? 

Refer to Dental pain Problem 21.9. Assume that randomized block model (21.11) is appro- 

priate. 

a. Obtain the analysis of variance table. 

b. Test whether or not the two factors interact; use œ = .01. State the alternatives, decision 
rule, and conclusion. What is the P-value of the test? 

c. Prepare separate bar-interval graphs for each set of estimated factor level means using 
95 percent confidence intervals. Does it appear that substantial main effects are present 
here? 

d. Test separately whether main effects are present for each of the factors; use œ = .01 for 
each test. State the alternatives, decision rule, and conclusion for each test. What is the 
P-value of each test? 

е. Estimate: 


Ly = ша. — H-2- = 0р — 05 


Ly = H-1 — роо = Bi — Bo 


Use the Bonferroni procedure with a 95 percent family confidence coefficient. State your 
findings. 

f. Test whether or not blocking effects are present; use œ = .01. State the alternatives, decision 
rule, and conclusion. What is the P-value of the test? 


A social scientist, after learning about generalized randomized block designs, asked: “Why 

would anyone use a randomized complete block design that requires the assumption that block 

and treatment effects do not interact when this assumption can be avoided with a generalized 

randomized block design?” Comment. 

Refer to the task completion example in Table 21.4. 

a. Verify the analysis of variance in Figure 21.6. 

b. Estimate the difference in mean effects for the two motivation levels using a 99 percent 
confidence interval. 

Refer to Auditor training Problem 21.5. The accounting firm repeated the experiment with 

another group of 30 auditors, but this time grouped them into five blocks of six each. In each 

block, each treatment was randomly assigned to two auditors. The results were: 


Training Method (j) Training Method (j) 


Block Block 
i 1 2 3 i 1 2 3 
1 74 84 91 4 65 73 84 
71 78 95 70 78 87 
2 73 75 93 5 64 71 81 
69 83 98 61 74 74 
3 75 81 89 
67 74 86 ft 


916 Part Five Multi-Factor Studies 


*21.14. 


21.15. 


*21.16. 


21.17. 


*2].18. 


21.20. 


Assume that generalized randomized block model (21.10) is appropriate. 

а. State the generalized randomized block model for this application. 

b. Obtain the analysis of variance table. 

c. Test whether or not the mean proficiency scores for the three training methods differ: Use 
a = .01. State the alternatives, decision rule, and conclusion. What is the P-value of the 
test? 

d. Make all pairwise comparisons between the three training methods: use the Tukey proce. 
dure with a 95 percent family confidence coefficient. Summarize your findings, 

e, Obtain the residuals and plot them against the fitted values. Also prepare a normal proba. 
bility plot of the residuals. State your findings. 

f. Test whether or not blocks interact with treatments; use œ = .01. State the alternatives 
decision rule, and conclusion. What is the P-value of the test? : 


Refer to Auditor training Problems 21.5 and 21.6. Assume that o = 2.5. What is the power 
of the test for training method effects in Problem 21.6c if и. = 70, 4-2 = 73, and н. = 76) 
Refer to Fat in diets Problems 21.7 and 21.8. Assume that o = .04. What is the power of the 
test for diet effects in Problem 21.8c if м. = 1.1, иә = 1.0. and м. = .9? 

Refer to Auditor training Problem 21.5. Another accounting firm wishes to conduct the 
same experiment with some of its auditors, using the same design and model. How many 
blocks would you recommend that this firm employ if it wishes to make all pairwise treatment 
comparisons with precision +1.5 with a 99 percent family confidence coefficient? Assume 
that a reasonable planning value for the error standard deviation in model (21.1) iso = 2,5. 
Refer to Fat in diets Problem 21.7. Suppose that the number of blocks to be used in the 
study, to consist of male subjects of similar ages, has not yet been determined. Assume that a 
reasonable planning value for the error standard deviation in model (21.1) is о = .04, 


a. What would be the required number of blocks if it is desired to make all pairwise diet 
comparisons with precision +.03 with a 95 percent family confidence coefficient? 

b. What would be the required number of blocks if: (1) differences in lipid level reduction 
means for the three diets are to be detected with probability .95 or more when the range of 
the treatment means 1 .12, and (2) the o risk is to be controlled at .01? 


Referto Auditor training Problems 21.5 and 21.6. According to the estimated efficiency mea- 
sure (21.13), how effective was the use of the blocking variable as compared to a completely 
randomized design? 


. Refer to Fat in diets Problems 21.7 and 21.8. According to the estimated efficiency measure 


(21.14). how effective was the use of the blocking variable as compared to a completely 
randomized design? 

Refer to Dental pain Problems 21.9 and 21.10. According to the estimated efficiency measure 
(21.13), how effective was the use of the blocking variable as compared to a completely 


randomized design? ^ 


Exercises 


21.21. 


21.22. 
21.23. 


21.24. 


(Calculus needed.) State the likelihood function for the randomized block fixed effects model 
(21.1) when ль = 3 andr = 2. Find the maximum likelihood estimators of the parameters. 
For randomized block fixed effects model (21.1), derive E{MSTR}. 

Show that when two treatments are studied in a randomized complete block design, the F 
test statistic (21.7b) for treatment effects is equivalent to the square of the two-sided £ test 
statistic for paired observations based on (A.69). 


Show that the two expressions for X} on page 900 are equivalent when no ties are present. 


Analysis of Covariance 


Analysis of covariance (ANCOVA) is a technique that combines features of analysis of vari- 
ance and regression. It can be used for either observational studies or designed experiments. 
The basic idea is to augment the analysis of variance model containing the factor effects 
with one or more additional quantitative variables that are related to the response variable. 
This augmentation is intended to reduce the variance of the error terms in the model, i.e., to 
make the analysis more precise. We considered covariance models briefly in Chapter 8 on 
page 329, and noted there that they are linear models containing both qualitative and quan- 
titative predictor variables. Thus, covariance models are just a special type of regression 
model. 

In this chapter, we shall first consider how a covariance model can be more effective than 
an ordinary ANOVA model. Then we shall discuss how to use a single-factor covariance 
model for making inferences. We conclude by taking up analysis of covariance models for 
two-factor studies and some additional considerations for the use of covariance analysis. 


22.1 Basic Ideas 


How Covariance Analysis Reduces Error Variability 
Covariance analysis may be helpful in reducing large error term variances that sometimes are 
present in analysis of variance models. Consider a study in which the effects of three different 
films promoting travel in a state are studied. A subject receives an initial questionnaire to 
elicit information about the subject's attitudes toward Ше state. The subject is then shown 
one of the three five-minute films, and immediately afterwards is questioned about the film, 
about desire to travel in the state, and so on. 

In this type of situation, covariance analysis can be utilized. To see why it might be 
highly effective, consider Figure 22.1a. Here are plotted the desirc-to-travel scores, obtained 
after each of the three promotional films was shown to a different group of five subjects. 
Three different symbols are used to distinguish the different treatments. It is evident from 
Figure 22.1a that the error terms, as shown by the scatter around the estimated treatment 
means Ү;., are fairly large, indicating a large error term variance. 

Suppose now that we were to utilize also the subjects' initial attitude scores. We plot in 
Figure 22.1b the desire-to-travel score (obtained after exposure to the film) against the initial 
attitude score for each of the 15 subjects. Note that the three treatment regression relations 

917 


918 Раг Еме Multi-Factor Studies 


FIGURE 22.1 Illustration of Error Variability Reduction by Covariance Analysis. 


Desire-to-Travel Score 


^N 
e 


fon) 
e 


л 
© 


(a) Error Variability with Single-factor Analysis of Variance Model 


Treatment 1 Treatment 2 Treatment 3 


g g 

o o 

о Uo 

WY 2 

T © 

> > 

P : : 
5 5 

= Ф Ф 

Y. © р; 

Ф Ф 

б à 

gelb p un—-p г. 
Subject Subject Subject 


(b) Error Variability with Covariance Analysis Model 


Treatment 2 
Treatment 3 


со 
e 


Treatment 1 


ч 
© 


fon) 
e 


Desire-to-Travel Score 


50 


Prestudy Attitude Score 


happen to be linear (this need not be so). Also note that the scatter around the treatment 
regression lines is much less than the scatter in Figure 22.1a around the treatment means Y ;., 
as a result of the desire-to-travel scores being highly linearly related to the initial attitude 
scores. The relatively large scatter in Figure 22.1a reflects the large error term variability 
that would be encountered with an analysis of variance model for this single-factor study. 
The smaller scatter in Figure 22. 1b reflects the smaller error term variability that would be 
involved in an analysis of covariance model. 

Covariance analysis, it is thus seen, utilizes the relationship between the response variable 
(desire-to-travel score, in our example) and one or more quantitative variables for which 
observations are available (prestudy attitude score, in our example) in order to reduce the 
error term variability and make the study a more powerful one for comparing treatment 
effects. 


Chapter 22 Analysis of Covariance 919 


comitant Variables 
x In covariance analysis terminology, each quantitative variable added to the ANOVA model is 
called a concomitant variable. We already encountered concomitant variables in Chapter 9, 
though not by that name. We mentioned in Chapter 9 that supplemental or uncontrolled 
variables are sometimes used in regression models for controlled experiments to reduce the 
a variance of the experimental error terms. We also noted in that chapter that control variables 
E may be added to the regression model in confirmatory observational studies to reflect the 
effects of previously identified explanatory variables as the effects of the new, primary 
explanatory variables on the response variable are being tested. Both the supplemental or 
uncontrolled variables in a controlled experiment and the control variables in a confirmatory 
observational study are concomitant variables that are added to the model primarily to reduce 
the variance of the error terms. Concomitant variables are sometimes also called covariates. 


"E 


Choice of Concomitant Variables. The choice of concomitant variables is an important 
one. If such variables have no relation to the response variable, nothing is to be gained 
by covariance analysis, and one might as well use a simpler analysis of variance model. 
Concomitant variables frequently used with human subjects include prestudy attitudes, age, 
socioeconomic status, and aptitude. When retail stores are used as study units, concomitant 
variables might be last period's sales or number of employees. 


Concomitant Variables Unaffected by Treatments. For a clear interpretation of the 
results, a concomitant variable should be observed before the study; or if observed during 
the study, it should not be influenced by the treatments in any way. A prestudy attitude 
score meets this requirement. Also, if a subject's age is ascertained during the study, it 
would be reasonable in many instances to expect that the information about age provided 
by the subject will not be affected by the treatment. The reason for this requirement can be 
seen readily from the following example. A company was conducting a training school for 
engineers to teach them accounting and budgeting principles. Two teaching methods were 
used, and engineers were assigned at random to one of the two. At the end of the program, a 
score was obtained for each engineer reflecting the amount of learning. The analyst decided 
to use as a concomitant variable in covariance analysis the amount of time devoted to study 
(which the engineers were required to record). After conducting the analysis of covariance, 
the analyst found that training method had virtually no effect. The analyst was baffled by 
this finding until it was pointed out that the amount of study time probably was also affected 
by the treatments, and analysis indeed confirmed this. One of the training methods involved 
computer-assisted learning which appealed to the engineers so that they spent more time 
studying and also learned more. In other words, both the learning score and the amount of 
study time were influenced by tbe treatment in this case. As a result of the high correlation 
between the amount of study time and the learning score, the marginal treatment effect of 
the teaching methods on amount of learning was small and the test for treatment effects 
showed no significant difference between the two teaching methods. 

Whenever a concomitant variable is affected by the treatments, covariance analysis will 
fail to show some (or much) of the effects that the treatments had on the response variable, 
so that an uncritical analysis may be badly misleading. 

A symbolic scatter plot can provide evidence as to whether the concomitant variable is 
affected by the treatments. Figure 22.2 shows a scatter plot of learning score and amount of 


920 PartFive Medti-Factor Studies 


FIGURE 22.2 
Illustration of 
Treatinents 
Affecting the 
Concomitant 
Variable— 
Engineer 
Training 
Example. 


Y 
5$. 
6 e 
9 s G 
5 ә 
© © *o 9 e 
a 2 9 о 
Q 
© 2 о о 
с 
5 оо 9 
A (0) 


© Teaching method 1 
O Teaching method 2 


X 
Amount of Study Time 


study time for the engineer training example. Treatment I is the one using computer-assisted 
learning. Note that most persons with this treatment devoted large amounts of time to study, 
On the other hand, persons receiving treatment 2 tended to devote smaller amounts of time 
to study. As a result, the observations for the two treatments tend to be concentrated over 
different intervals on the X scale. 

Contrast this situation with the one seen in Figure 22.1b for the study on promotional 
films. Figure 22. Ib illustrates how the concomitant variable observations should be scattered 
in a randomized experiment if the treatments have no effect on the concomitant variable. 
Here, the distribution of subjects along the X scale by prestudy attitude scores is roughly 
similar for all treatments, subject only to chance variation. 


Comment 


Covariance analysis is concerned with quantitative concomitant variables. When qualitative concomi- 
tanı variables need to be added (e.g.. gender, geographic region). the model remains an analysis of 
variance model where some of the factors are of primary interest and the others represent concomitant 
variables that are included for the purpose of error variance reduction. L| 


22.2 Single-Factor Covariance Model 


4 

The covariance models to be presented in this chapter are applicable to observational studies 
and to experimental studies based on a completely randomized design. In the earlier engineer 
training example, the 24 engineers participating in the study were randomly assigned to 
the two teaching methods, with [2 engineers assigned to each teaching method. Thus, this 
experimental study was based on a completely randomized design. 

The covariance models to be taken up in this chapter are also applicable to observational 
studies, such as an investigation of the salary increases of a company's employees in the 
accounting department by gender. where age is utilized as a concomitant variable. 


Chapter 22 Analysis of Covariance 921 


We shall employ the notation for single-factor analysis of variance. The number of cases 
for the ith factor level is denoted by л;, the total number of cases by лт = Yn, and the 
è jth observation on the response variable for the ith factor level is denoted by Y;;. We shall 
initially consider a single-factor covariance model with only one concomitant variable. 
Later we shall take up models with more than one concomitant variable. We shall denote 
the value of the concomitant variable associated with the jth case for the ith factor level 
by X ij- 


Development of Covariance Model 
Е The single-factor ANOVA model in terms of fixed factor effects was given ір (16.62): 


Y= и. + + ey (22.1) 


The covariance model starts with this ANOVA model and adds another term (or several), 
reflecting the relationship between the response variable and the concomitant variable. 
Usually, a linear relation is utilized as a first approximation: 


MT = u. + Tı +Y Xij £j (22.2) 


Here y is a regression coefficient for the relation between Y and X. The constant u. now 
is no longer an overall mean. We can, however, make this constant an overall mean, and 
incidentally simplify some computations, if we center the concomitant variable around the 
overall mean X... The resulting model is the usual covariance model for a single-factor 
study with fixed factor levels: 


Yj = ш. +G + y(Xij — Х..) + i (22.3) 


where: 


р. is an overall mean 

т; are the fixed treatment effects subject to the restriction Ут; = 0 
y is a regression coefficient for the relation between Y and X 

Xj; are constants 

є are independent N (0, o?) 

t= Lora] = lsn 


Covariance model (22.3) corresponds to ANOVA model (22.1) except for the term 
У(Х — X..), which is added to reflect the relationship between Y and X. Note that the 
concomitant observations X;; are assumed to be constants. Since ¢;; is the only random 
variable on the right side of (22.3), it follows at once that: 


E(Y) = и. t t Бубу — X.) (22.4a) 
o?’ {Y} = o? (22.4b) 


922 Part Five Multi-Factor Studies 


In view of the independence of the ¢;;, the Y;; are also independent. Hence, an alternatiy, 
statement of covariance model (22.3) is: 5 


Y;; are independent N (Hij, 07) (22.5) 
where: 


шу = B. + y (OG; — X.) 


ут =0 


Properties of Covariance Model 


FIGURE 22.3 
Example of 
Treatment 
Regression 
Lines with 
Covariance 
Model (22.3). 


Some of the properties of covariance model (22.3) are identical to those of ANOVA 
model (22.1). For instance, the error terms &;; are independent and have constant variance. 
There are also some new properties, and we discuss these now. 


Comparisons of Treatment Effects. With the analysis of variance model, all observations 
for the ith treatment have the same mean response; i.e., E{Y;;} = u for all j. This is not 
so with the covariance model, since the mean response E(Y;;] here depends not only on 
the treatment but also on the value of the concomitant variable X;; for the study unit. Thus, 
the expected response for the ith treatment with covariance model (22.3) is given by a 
regression line: 


шј = B. y OG; — X.) (22.6) 


This regression line indicates, for any value of X, the mean response with treatment i. 
Figure 22.3 illustrates for a study with three treatments how these treatment regression 
lines might appear. Note that u. + т; is the ordinate of the line for the ith treatment when 


Treatment 3 


Treatment 1 


Treatment 2 


^ 


Chapter 22 Analysis of Covariance 923 


Treatment 2 


Treatment 1 


X — X.. = 0, that is, when X = X.., and that y is the slope of each line. Since all treatment 
regression lines have the same slope, they are parallel. 

While we no longer can speak of the mean response with the ith treatment since it varies 
with X, we can still measure the effect of any treatment compared with any other by a 
single number. In Figure 22.3, for instance, treatment 1 leads to a higher mean response than 
treatment 2 by an amount that is the same no matter what is the value of X. The difference 
between the two mean responses is the same for all values of X because the slopes of the 
regression lines are equal. Hence, we can measure the difference at any convenient X, say, 
at X = X..: 


B. = (и. + о) = т – 0 (22.7) 


Thus, т, — v; measures how much higher the mean response is with treatment 1 than with 
treatment 2 for any value of X. We can compare any other two treatments similarly. It follows 
directly from this discussion that when all treatments have the same mean responses for 
any X (1.е., the treatments have no differential effects), the treatment regression lines must 
be identical; and hence; ту — T2 = 0, t, — T3 = 0, etc. Indeed, all т; equal zero in that case. 


Constancy of Slopes. The assumption in covariance model (22.3) that all treatment re- 
gression lines have the same slope is a crucial one. Without it, the difference between the 
effects of two treatments cannot be summarized by a single number based on the main 
effects, such as T2 — тү. Figure 22.4 illustrates the case of nonparallel slopes for two treat- 
ments. Here, treatment 1 leads to higher mean responses than treatment 2 for smaller values 
of X, and the reverse holds for larger values of X. When the treatments interact with the con- 
comitant variable X, resulting in nonparallel slopes, covariance analysis is not appropriate. 
Instead, separate treatment regression lines need to be estimated and then compared. 


‘Generalizations of Covariance Model 


Covariance model (22.3) for single-factor studies can be generalized in several respects. 
We mention briefly three ways in which this model can be generalized. 


Nonconstant Xs. Covariance model (22.3) assumes that the observations X;; on the 
concomitant variable are constants. At times, it might be more reasonable to consider the 
concomitant observations as random variables. In that case, if covariance model (22.3) can 
be interpreted as a conditional one, applying for any X values that might be observed, the 
covariance analysis to be presented is still appropriate. "m 


924 Par Five Multi-Factor Studies 


Nonlinearity of Relation. The linear relation between Y and X assumed in Covaria 
model (22.3) is not essential to covariance analysis. Any other relation could be ug ed ig 
instance, the model for a quadratic relation is as follows: 8 

Y; = и. ++ yOG; X) + OG; — XY 6j (228) 
Linearity of the relation leads to simpler analysis and is often a sufficiently good approx- 
imation to provide meaningful results. If a linear relation ts not a good approximation 
however. a more adequate description of the relation should be utilized in the covariance 
model. Covariance analysis does require, however, that the treatment response functions be 
parallel; in other words, there must not be any interaction effects between the treatment and 
concomitant variables. 


Several Concomitant Variables. Covariance model (22.3) uses a single concomitant 
variable. This is often sufficient to reduce the error variability substantially. However, the 
model сап be extended in a straightforward fashion to include two or more concomitant 
variables. The single-factor covariance model for two concomitant variables, X, and X3, to 
the first order is as follows: 


Yi; = ш. d cT y(Xii— X.) + У(Х з — X.a) + &ij (22.9) 


Regression Formulation of Covariance Model 


An easy way to estimate the parameters of covariance model (22.3) and make inferences 
is through the regression approach. Computational formulas for manual calculation were 
developed before the advent of computers. making use of the special structure of the X 
matrix for covariance models. Today, however, covariance calculations can be carried out 
readily by means of standard regression packages. 

As for the regression formulation of analysis of variance models, we shall employ r — 1 
indicator variables taking on the values 1, — I. or О to represent the г treatments in a 
covariance analysis model: 


I if case from treatment 1 
I] = «4 —I if case from treatment r 
О otherwise 


(22.10) 


| if case from treatment r — 1 
1—0 = 4 —I tf case from treatment r 
О otherwise 


Note that we now denote the indicator variables by the symbol / to clearly distinguish 
the treatment effects from the concomitant variable X. Ы 

In expressing covariance model (22.3) in regression form, we shall, as іп the regression 
chapters. denote the centered observations X;; — X.. by x; у. Covariance model (22.3) can 
then be expressed as follows: 


Ү = ш. + + pipe + УХ tei (22.11) 


3 


i 


Chapter 22 Analysis of Covariance 925 


Here, у is the value of indicator variable J, for the jth case from treatment i, and sim- 
Папу for the other indicator variables. Note that the treatment effects t,,..., т,— are the 
regression coefficients for the indicator variables. 

Now that we have formulated covariance model (22.3) as a regression model, our discus- 
sion of regression analysis in previous chapters applies. We therefore consider only briefly 
how to examine the appropriateness of the covariance model and how to make relevant 
inferences before turning to an example to illustrate the procedures. 


propriateness of Covariance Model 


Some of the key issues concerning the appropriateness of covariance model (22.3) and the 
equivalent regression model (22.11) deal with: 


1. Normality of error terms. 

2. Equality of error variances for different treatments. 

3. Equality of slopes of the different treatment regression lines. 

4. Linearity of regression relation with concomitant variable. 

5. Uncorrelatedness of error terms. 

The third issue, concerning the equality of the slopes of the different treatment regression 
lines, is particularly important in evaluating the appropriateness of covariance model (22.3). 
The test in Section 8.7 to compare several regression lines is applicable for determining 


whether the condition of equal slopes in the covariance model is met. We shall illustrate 
this test in the example in Section 22.3. 


üferences of Interest 


The key statistical inferences of interest in covariance analysis are the same as with analysis 
of variance models, namely, whether the treatments have any effects, and if so what these 
effects are. Testing for fixed treatment effects involves the same alternatives as for analysis 
of variance models: 


dd pm (22.12) 
H,: not all т; equal zero 
As we can see by referring to the equivalent regression model (22.11), this test involves 
testing whether several regression coefficients equal zero. The appropriate test statistic 
therefore is (7.27). 

If the treatment effects are found to differ, the next step usually is to investigate the 
nature of these effects. Pairwise comparisons of treatment effects т; — v, (the vertical 
distance between the two treatment regression lines) may be of interest, or more general 
contrasts of the v; may be relevant. In either case, linear combinations of the regression 
coefficients 7;, ..., t, ., are to be estimated. 

Occasionally, the nature of the regression relationship between Y and X is of interest, but 
usually the concomitant variable X is only employed in ANCOVA models to help reduce 
the error variability. 


Comment 


In covariance analysis there is usually no concern with whether the regression coefficient y is zero, 
that is, whether there is indeed a regression relation between Y and X. If there is no relation, no bias 


926 PartFive 9 Multi-Factor Studies 


results in the covariance analysis. The error mean square would simply be the same as for the 
of variance model (allowing for sampling variation). and one degree of freedom would be Io 
error mean squarc. 


analysis 
St for the 
a 


22.3 Example of Single-Factor Covariance Analysis 


А company studied the effects of three different types of promotions on sales of its Crackers: 


Treatment ]—Sampling of product by customers in store and regular shelf space 
Treatment 2—Additional shelf space in regular location 


Treatment 3—Special display shelves at ends of aisle in addition to regular shelf Space 


Fifteen stores were selected for the study, and a completely randomized experimental design 
was utilized. Each store was randomly assigned one of the promotion types, with five Stores 
assigned to each type of promotion. Other relevant conditions under the control of the 
company, such as price and advertising, were kept the same for all stores in the study. Data 
on the number of cases of the product sold during the promotional period, denoted by Y , are 
presented in Table 22.1, as are also data on the sales of the product in the preceding period, 
denoted by X. Sales in the preceding period are to be used as the concomitant variable. 


Development of Model 


Figure 22.5 presents the data of Table 22.1 in the form of a symbolic scatter plot. Linear 
regression and parallel slopes for the treatment regression lines appear to be reasonable. 
Therefore, the following regression model was tentatively selected: 


Yi = џи. + т lii + тэ» + Уху + &i, Full model (22.13) 
where: 
| ifstore received treatment 1 
I] —4 —I if store received treatment 3 
О otherwise 
| if store received treatment 2 
h = 4 —1 if store received treatment 3 
O otherwise 
Aij = Xij = Х.. 
* 
TABLE 22.1 2 
Data—Cracker Store (j) 
P 5 
асана Тгёабтепї 1 2 3 4 
Хашр Yn X Yo X Ys X Ya X Ys Xi 
(number of Ц п n i2 i2 i3 i3 14 i4 is is 
cases sold). 1 38 21 39 26 36 22 45 28 33 19 
2 43 34 38 26 38 29 27 18 34 25 
3 24 23 32 29 31 30 21 16 28 29 


ОКЕ 22.5 
S alic 


етиш 
кт 

al іё 4 

Grücker 

3 motion 
ample. 


TABLE 22.2 
Regression 
Variables for 
Single-Factor 
Covariance 
nimis 
Cracker 
Promotion 
Example. 


Chapter 22 Analysis of Covariance 927 


Sales in Promotion Period 


© Treatment 1 
* Treatment 2 
= Treatment 3 


20 


| & @ © 
Я # MES ES - 
t 0: “T, 0; 
3 EX ed = 


Table 22.2 repeats a portion of the data on the responses Y and the concomitant variable 
X in columns 1 and 2. The centered concomitant variable x is presented in column 3 and 
the indicator variables for the treatments in columns 4 and 5. Note that the centering of 
the concomitant variable is around the overall mean X.. — 25. Regressing Y in column 1 
of Table 22.2 ор x, ҺП, and J, in columns 3—5 by a computer package led to the results 
summarized in Table 22.3. 

Various residual plots were obtained to examine the appropriateness of regression model 
(22.13). Figure 22.6 contains two of these. Figure 22.6a contains aligned residual dot plots 
for the three treatments. These do not suggest any major differences in the variances of the 
error terms. Figure 22.6b contains a normal probability plot of the residuals, which shows 
some modest departure from linearity. However, the coefficient of correlation between the 
ordered residuals and their expected values under normality is .958, for which Table B.6 
does not suggest any significant departure from normality. The analyst also conducted a test 
to confirm the equality of the slopes of the three treatment regression lines. This test will 
be described shortly. On the basis of these analyses, the analyst concluded that regression 
model (22.13) is appropriate here. 


928 Part Five Multi-Factor Studies 


TABLE 22.3 : un 

Conpater (a) Regression Coefficients 

Output for й. = 33.800 %=.942 

Covariance # = 6.017 y = .899 

Model 

(22.13)— (b) Analysis of Variance 

Cracker 

Promotion Source of 

Example. Variation SS df MS 
Regression SSR = 607.829 3 MSR = 202.610 
Error SSE= 38.571 11 MSE= 3.506 
Total | SSTO = 646.400 14 


b. 


cb cb 
= 


oS 


.233 
0 5016 
0 —.2603 .4882 
0 0189 —.0147 .0105 


FIGURE 22.6 Diagnostic Residual Plots—Cracker Promotion Example. 
(2) Residual Dot Plots 


3 
2 VEIT 
1 
E 
LÀ e LJ ЕД 5 0 ° 
3 — — —31 4—9 — 5 а 
ә in e* 
© е e e . ea 
E 2——3————31 +—— —+— oes t 
Ф 
e . © е . . -3 
1 E— —34 +—— -— tt 4 31 
-2 -1 0 1 2 к. " 1 
Residual Exp Value 


Test for Treatment Effects 


(b) Normal Probability Plot 


4 


SE 


То test whether or not the three cracker promotions differ in effectiveness, we can ei 
follow the general linear test approach of fitting full and reduced models and using 
statistic (2.70) or use extra sums of squares and test statistic (7.27). In either case, 


Chapter 22 Analysis of Covariance 929 


TABLE 22.4 Regression ANOVA Results for Reduced 
Model (22.15)—Cracker Promotion Example. 


: Source of 7 
Уагіайоп 55 ағ 
Regression SSR =190.678 ` 1 
Error SSE = 455:722- 13 
Total SSTO = 646.400 14 


alternatives are: 


Ho: тү =n = 0 
(22.14) 
H,: not both т, and v; equal zero 


Note that тз = —7, — T2 must equal zero when тү = 7; = 0. 
We shall conduct the test by means of the general linear test approach. First, we develop 
the reduced model under Ho: 


Y; — ш. Кух; tj Reduced model (22.15) 


Model (22.15) is just a simple linear regression model where none of the parameters vary 
for the different treatments. When regressing Y in column 1 of Table 22.2 on x in column 3, 
we obtain the analysis of variance results in Table 22.4. 

We see from Table 22.4 that SSE(R) — 455.722 and from Table 22.3b that SSE(F) — 
38.571. Hence, test statistic (2.70) here is: 


... SSE(R) -SSE(F) ___ ЗЕБ) 
© (nr -2)—- [nr — 1]. nz — Gm +1) 


455.722 — 38.571 | 38.571 _ 
z 13—11 BESTE 


* 


59.5 


The level of significance is to be controlled at œ = .05; hence, we need to obtain 
Е(.95; 2, 11) = 3.98. The decision rule therefore is: 


If F* « 3.98, conclude Ho 
If F* > 3.98, conclude H, 


Since F* = 59.5 > 3.98, we conclude H,, that the three cracker promotions differ in sales 
effectiveness. The P-value of the test is О+. 


Comment 


Occasionally, a test whether or not y = 0 is of interest. This is simply the ordinary test whether or 
not a single regression coefficient equals zero. It can be conducted by means of the t* test statistic 
(7.25) or by means of the F* test statistic (7.24). ш 


930 PartFive Multi-Factor Studies 


Estimation of Treatment Effects | 


Since treatment effects were found to be present in the cracker promotion stud 
next wished to investigate the nature of these effects. We noted earlier that а c 
two treatments involves т; — т, the vertical distance between the two treatm 
lines. Using the fact that ту = —t, — T2 and (A.30b) for the variance of a line. 
of two random variables, we see that the estimators of all pairwise compa 
variances are as follows: 


У, е analys? 
OMPatison of 
Ent regression 
аг Combination 
TISONS and their. 


——————————————————————————————————————————— 


Comparison Estimator Variance 
„лке ла LEE nus hte ОЕА 
TT tit о) +0206) — 26 (8, &) 
тү — 13 = 2171 4 79 27, +5 4o?(8] +002) -- 4048, 5) (22.16) 
то — та =T + 27 îi + 225 о} + 4o? (&] +4001, &;) 


Table 22.3a furnishes the needed estimated regression coefficients, and Table 223c 
provides their estimated variances and covariances. We obtain from there: 


Comparison Estimate Variance 
Up — 7 6.017 — .942 .5016 + .4882 — 2(—.2603) 
= 5.075 = 1.5104 
T| — 13 2(6.017) 4- .942 4(.5016) + .4882 + 4(—.2603) (22.16а) 
— 12.976 = 1.4534 
15 — Т3 6.017 + 20.942) .5016 + 4(.4882) + 4(—2.2603) 
= 7.901 = 1.4132 


When a single interval estimate is to be constructed, the f distribution with пу —r —1 
degrees of freedom is used. (The degrees of freedom are those associated with MSE in the 
full covariance model.) Usually, however, a family of interval estimates is desired. In that 


ase, the Scheffé multiple comparison procedure may be employed with the 5 multiple 
defined by: 


52 =(г—1)К(1—о;к—1,пт—к—1) (22.17) 
or the Bonferroni method may be employed with the B multiple: y 
B=t(l—a/2g,n7—r —1) (22.18) 


where g is the number of statements in the family. The Tukey method is not appropriate for 
covariance analysis. 

In the case at hand, the analyst wished to obtain all pairwise comparisons with a 95 percent 
family confidence coefficient. The analyst used the Scheffé procedure in anticipation that 


A 


state тае ae 


"m 


Chapter 22 Analysis of Covariance 931 


some additional estimates of contrasts might be desired. We require therefore: 
S? = (3 — 1)F(.95; 2, 11) 22(3.98 2796 S=2.82 


Using the results in (22.16a), the confidence intervals for all pairwise treatment comparisons 
with a 95 percent family confidence coefficient then are: 

1.61 = 5.075 — 2.82V 1.5104 < t, — t; < 5.075 + 2.824/1.5104 = 8.54 

9.58 = 12.976 — 2.824/1.4534 < Tı — тз x 12.976 + 2.824/1.4534 = 16.38 

4.55 = 7.901—2.824/1.4132 < x; — т x 7.901 + 2.824/ 1.4132 = 11.25 


JA 


A 


These results indicate clearly that sampling in the store (treatment 1) is significantly better 
for stimulating cracker sales than either of the two shelf promotions, and that increasing 
the regular shelf space (treatment 2) is superior to additional displays at the end of the aisle 
(treatment 3). 


Comments 


1. Occasionally, more general contrasts among treatment effects than pairwise comparisons are 
desired. No new problems arise either in the use of the г distribution for a single contrast or in the 
use of the Scheffé or Bonferroni procedures for multiple comparisons. For instance, if the analyst 
desired in the cracker promotion example to compare the treatment effect for sampling in the store 
(treatment 1) with the two treatments involving shelf displays (treatments 2 and 3), the following 
contrast would be of interest: 


T2 + T3 


L=ņ— 2 (22.19) 
The appropriate estimator is: 
L=- bono) = a (22.20) 
2 2 
The variance of this estimator is by (A.16b): 
iis 50^] (22.21) 


2. Sometimes there is interestin estimating the mean response with the ith treatment for a "typical" 
value of X. Frequently X — X.. is considered to be a "typical" value. We know from Figure 22.3 that 
atX =X .., the mean response for the ith treatment is the intercept of the treatment regression line, 
и. +7;. An estimator of u. + т; can be readily developed. For the cracker promotion example, we 
obtain the following estimators and their variances: 


Mean Response 


at X = X.. Estimator Variance 
ш. + f. 4-8 сй.) + o?(&) + 20{A.. ё) 
и. + т Ё. + c?(f..) + 020%} + 20{0., %} (22.22) 
и. +73 B.—8 — ty оӣ.) - o?(&) + 07 {82} —2е{Д., &) 


— 20 (f.., $9) 3-20 (8j, ®›} 


932 Part ме Mult-Factor Studies 


Use of the results in Table 22.3 leads to the Following estimates: 


Estimated Mean Response 


Treatment at X.. Estimated Variance 
1 33.800 + 6.017 = 39.817 2338 + 5016 + 2(0) = 7354 
2 33.800 + .942 = 34.742 .2338 + .4882 + 2(0) = .7220 
3 33.800 — 6.017 — .942 2338 + 5016 1.4882 — 2(0) — 2@ 
= 26.841 + 2(—.2603) = .7030 


The estimated mean response for treatment г at X = X.. is often called the adjusted estimated 
ireatuent mean, 115 said to be "adjusted" because it takes into account the effect of the concomi- 
tant variable. À comparison of the adjusted treatment means leads, of course. to the same pairwise 
comparisons of treatment effects as before; for instance, 39.817 — 34.742 = 5.075 = 1 -ĉ. m 


Test for Parallel Slopes 


An important assumption in covariance analysis is that all treatment regression lines haye 
the same slope y. The analyst who conducted the cracker promotion study. indeed, tested 
this assumption before proceeding with the analysis discussed earlier. We know from Chap- 
ter 8 that regression model (22.13) can be generalized to allow for different slopes for the 
treatments by introducing cross-product interaction terms. Specifically, interaction variables 
Ix and hx will be required here. We shall denote the corresponding regression coefficients 
by B, and f». Thus. the generalized model is: 


Yi = и.т lii + тЇз + ДҮЙ + fi ПАХ + P» lii + Ei; Generalized model 
(22.23) 


Table 22.2 contains in columns 6 and 7 the interaction variables for this model for the 
cracker promotion example. Regressing the response variable Y in column | of Table 22.2 
on x, J, Б, Дх, Dx in columns 3—7 by means of a computer multiple regression package 
yielded the ANOVA results in Table 22.5. The error sum of squares SSE obtained by fitting 
generalized model (22.23) is the equivalent of fitting separate regression lines for each 
treatment and summing these error sums of squares. 


TABLE 22.5 Regression ANOVA Results for Generalized 
Model (22.23)—Cracker Promotion Example. 


Source of 

Variation SS df 
Regression SSR = 614.879 5 
Error SSE= 31.521 


Total SSTO = 646.400 14 


Chapter 22 Analysis of Covariance 933 


The test for parallel slopes is equivalent to testing for no interactions in generalized 
model (22.23): 
Ho: By = Bo = 0 


(22.24) 
H,: not both В, and £; equal zero 


We need to recognize that generalized model (22.23) now is the “full” model and covariance 
model (22.13) is the “reduced” model. Hence, we have from Tables 22.3b and 22.5: 


SSE(F)—31.59344 . SSE(R) = 38.571 


Thus, test statistic (2.70) becomes here: 


re 38.571 — 31.521 31.521 
Е 11—9 | 9 

For level of significance a = .05, we require F(.95; 2, 9) = 4.26. Since F* = 1.01 < 4.26, 

we conclude Hp, that the three treatment regression lines have the same slope. The P-value 


of the test is .40. Hence, the requirement of equal treatment slopes in analysis of covariance 
model (22.13) is met in the cracker promotion example. 


— 1.01 


Comments 


1. An indication of the effectiveness of the analysis of covariance in reducing error variability can 
be obtained by comparing MSE for covariance analysis with MSE for regular analysis of variance. 
For the cracker promotion example, we know from Table 22.3 that MSE for the covariance analysis 
is 3.51. It can be shown that the error mean square for regular analysis of variance would have been 
26.63. Hence, in this case, covariance analysis was able to reduce the residual variance by about 
87 percent, a substantial reduction. 

2. Covariance analysis and analysis of variance need not lead to the same conclusions about the 
treatment effects. For instance, analysis of variance might not indicate any treatment effects, whereas 
covariance analysis with a smaller error variance could show significant treatment effects. Ordinarily, 
of course, one should decide in advance which of the two analyses is to be used. ш 


224 "Two-Factor Covariance Analysis 


We have until now considered covariance analysis for single-factor studies with r treatments. 
Covariance analysis can also be employed with two-factor and multifactor studies. We 
illustrate now the use of covariance analysis for two-factor studies with one concomitant 
variable. For notational simplicity, we consider the case where the treatment sample size 
is the same for all treatments. However, the regression approach to covariance analysis is 
general and applies directly when the study is unbalanced, with unequal treatment sample 
sizes. 


Covariance Model for Two-Factor Studies 


The fixed effects ANOVA model for a two-factor balanced study was given in (19.23): 


Үр = ш. t о + B; + (08); + Eijk i=l,...,a;j=1,...,b;k=1,...,n 
(22.25) 


934 Part Five  Multi-Factor Studies 


where 0; is the main effect of factor A at the ith level, B; is the main effect of factor B at a: 
jth level, and (08):, is the interaction effect when factor A is at the ith leve] and fa a the: 
ctor. В 


is at ће jth level. The covariance model for a two-factor study with a single со, 


nDCOmifa 

variable, assuming the relation between Y and the concomitant variable X ig rr a 
Y; = н. o; + B; + («бу t y GG — X...) T Eijk 

i= 1,....ау} = 1....,Ь;К =1,...,п (22.26). 


Regression Approach 


Example 


We illustrate the regression approach to covariance analysis for a balanced two-factor stud 
with one concomitant variable when both factors A and B are at two levels, i e. When 
a = b = 2. The regression model counterpart to covariance model (22.26) then is: 


Yi = н. + eiii + Во + («Ву Tije lije d ух + Eijk (22.27) 
ууһеге: 


I= І ifcase from level 1 for factor А 
1 | 1 ifcase from level 2 for factor A 


L= l ifcase from level | for factor В 
2 | 1 ifcase from level 2 for factor B 


Xijk = Xijk — X... 


Note that the regression coefficients in (22.27) are the analysis of variance factor effects ол, 
P1, and (е8), and the concomitant variable coefficient y. 

Testing for factor A main effects requires that a; = 0 in the reduced model. Correspond- 
ingly, £| = 0 is required in the reduced model when testing for factor B main effects, and 
(08) 1, = Q is required in the reduced model when testing for AB interactions. 

Estimation of factor A and factor B main effects can easily be done in terms of com- 
parisons among the regression coefficients. The use of the Scheffé and Bonferroni multiple 
comparison procedures presents no new issues. For instance, the 5 multiple for multiple 
comparisons among the factor A level means is defined as follows: 


? = (a — 1)F(1 —o:;a — 1, nab — ab — 1) (22.28) 


and the B multiple is the same as in (22.18). with пт = nab and r = ab. 


A horticulturist conducted an experiment to study the effects of flower variety (factor А: 
varieties LP, WB) and moisture level (factor B: low, high) on yield of salable flowers (Y). 
Because the plots were not of the same size, the horticulturist wished to use plot size (X) as 
the concomitant variable. Six replications were made for each treatment. A portion of the 
data are presented in Table 22.6. Figure 22.7 contains a symbolic scatter plot of the data. 
The model assumptions of linear relations between Y and the concomitant variable X, as 
well as of parallel slopes for the four treatments, appear to be reasonable here. 

A fit of regression model (22.27) to the data by a computer regression package yielded 
the fitted regression function in Table 22.7a. The analyst plotted the data together with the 
fitted regression lines and made a variety of residual plots and tests (not shown). On the 


1GURE 22.7 
* polic 


Chapter 22 Analysis of Covariance 935 


& Factor B 
UN (moisture level) 
Я j | 
Factor А AT T B. (Bich) 
d (flower variety) 81 kaa DA B thigh) | 
i Үпк Xnk Үк Хоҳ 
Ai (variety LP) 98 15 71 10 
60 4 80 12 
64 5 55 3 
Az (variety WB) 55 4 76 11 
60 5 68. 10 
78 11 70 9 


Number of Flowers 


0 5 10 15 20 
Size of Plot 


basis of these diagnostics, the analyst was satisfied that regression model (22.27), which 
assumes parallel linear regression functions and constant error variances, is suitable here. 

Toexamine the nature of the factor effects, we show in Figure 22.8 the estimated treatment 
means plot for the two moisture levels B, and B2. These estimated means all correspond 
to plot size X = X... = 8.25 or x = 0. Any other plot size would yield exactly the same 
relationships as those in Figure 22.8. It appears from Figure 22.8 that there are no important 
interactions between flower variety and moisture level, and that there may be main effects 
for both factors, particularly for moisture level. 

To study formally the factor effects, reduced models were formed by deleting from 
regression model (22.27) one predictor variable at a time (recall that both factors have only 
two levels), and the reduced models were then fitted. The extra sums of squares so obtained, 
as well as the error sum of squares for the full model, are presented in Table 22.7b, together 
with the degrees of freedom and mean squares. No total sum of squares is shown because 
the factor effect components are not orthogonal. 


936 Part Five Multi-Factor Studies 


TABLE 22.7 
Computer 
Output for Fit 
of Regression 
Model 
(22.27) — 
Salable 
Flowers 
Example. 


FIGURE 22.8 
Estimated 
Treatment 
Means 
Plot—Salable 
Flowers 
Example. 


| (a) Fitted Regression Function 
f = 70.0 + 2.042341, + 3.680781; -.819221 12 + 3.27688x 


Estimated Estimated 
Regression Coefficient Regression Coefficient Standard Deviation 
ee 2.04234 .52108 
fi 3.68078 51291 
(o8)u .81922 51291 
y 3.27688 .13002 


3 (b) Extra Sums of Squares 


Е Source of 
Effect Variation SS df MS 
Concomitant variable Xlh, h, h l2 3,994.52 1 3,994.52 
A hx, lo, hl? 96.60 1 96.60 
B hix, h, ile 323.85 1 323.85 
AB hl21x, h, l2 16.04 1 16.04 
Error 119.48 19 6.2884 


Estimated Treatment Means 
at x = 0 


100 


80 ae Level 
B 


.———————————•в, 


60 


Number of Flowers 


Variety 


We test first for the presence of interactions by means of the usual general linear 
statistic F*, using the results in Table 22.7b: 
J SSRU |x, h, D) . | 1604 _ 2 
1 6.2884 
Бого = .01, we require F (.99; 1, 19) = 8.18. Since F* = 2.55 < 8.18, we conclude 
no interactions are present. The P-value of the test is .13. 


ж 


Chapter 22 Analysis of Covariance 937 


We now wish to compare both the factor А main effects and the factor В main effects 
by means of confidence intervals, with a 95 percent family confidence coefficient. Since 


i i f 0; = —o1, we have for our example: 
E E Lı = ол — %2 = ол — (01) = 201 
22 Similarly, we obtain for the comparison of factor B main effects: 


Lı = 2B; 
Point estimates are readily obtained from the results in Table 22.7a: 
Ї = 2à, = 2(2.04234) = 4.08 
Ê, = 2, = 2(3.68078) = 7.36 
The estimated standard deviations also follow easily, using (A.16b): 
st£,) = 2s(à,) = 2(.52108) = 1.042 
s(£5) = 2s{B,} = 2(.51291) = 1.026 


We utilize the Bonferroni simultaneous estimation procedure for g = 2 comparisons. 
For a 95 percent family confidence coefficient, we require t[1 — .05/2(2); 19] = 
t(.9875; 19) = 2.433. The two desired confidence intervals therefore are: 


1.5 = 4.08 — 2.433(1.042) < a, — оо < 4.08 + 2.433(1.042) = 6.6 
4.9 = 7.36 — 2.433(1.026) < Bi — Bo < 7.36 + 2.433(1.026) = 9.9 


With family confidence coefficient .95, we conclude that variety LP yields, on the average, 
between 1.5 and 6.6 more salable flowers for any given plot size than variety WB. Also, 
for any given plot size, the mean number of salable flowers is between 4.9 and 9.9 flowers 
greater for the low moisture level than for the high one, thus indicating a substantial effect 
of moisture level on yield. 

If interactions had been present, we could have studied the nature of the interaction 
effects by, for instance, comparing the effect of the moisture level for each of the two flower 
varieties. It can be shown that this comparison is given by (@B)12 = —(@B)11. Hence, we 
could estimate the desired interaction effect by using the estimated regression coefficient 
(&В) 1. and its estimated standard deviation in Table 22.7a. 


Covariance Analysis for Randomized Complete Block Designs 
Covariance analysis can be employed to further reduce the experimental error variability ina 
randomized complete block design. The extension is a straightforward one from covariance 
analysis for a completely randomized design. 


Covariance Model. The usual randomized block design model was given in (21.1). The 
covariance model for a randomized block design with one concomitant variable is obtained 
by simply adding a term (or several terms) for the relation between the response variable 
Y and the concomitant variable X. Assuming this relation can be described by a linear 
function, we obtain: 


Y = ш. + pi +T; yg — Х..) + е Dl, рур (22.29) 


938 Part Five Multi-Factor Studies 


Regression Approach. The regression approach to covariance mode] (22.29) involve 
new principles. We shall denote the centered variable X;; — X.. in covariance model ie 
by ху. Further, we shall again use 1, —1, 0 indicator variables for the block and treatme ) 
effects. То illustrate an equivalent regression model, consider a randomized Complete. biot 
design study with ль = 4 blocks and r = 3 treatments. The regression model Counterpart 
to covariance model (22.29) then is: 


Y; = H.. + pilij + pzlij2 + paijs + trlija + tolijs + уху + &ij Full mode] 


(22.30) 
where: 
1 if experimental unit from block 1 
I] =< —1 if experimental unit from block 4 
О otherwise 
Б; are defined similarly 
1 if experimental unit received treatment 1 
ц = 4 —1 if experimental unit received treatment 3 
0 otherwise 
1 if experimental unit received treatment 2 
Б = 4 —1 if experimental unit received treatment 3 
О otherwise 
Xij = Xij m X e 
To test for treatment effects: 
Не: |) =m 14 = 0 
ee (22.31) 


Ha: not all t; equal zero 


we would either need to fit the reduced model under Ho: 
Yi; = M.. + piii + poli; + pili + yxij +£; Reduced model (22.32) 


or else use the appropriate extra sum of squares. The test for treatment effects is then 
conducted in the usual way. 

Comparisons of two treatment effects by the regression approach are straightforward. 
For estimating Tı — T2, for instance, we use the unbiased estimator ў — 2, based on the 
estimated regression coefficients obtained when fitting the full model (22.30). The estimated 
variance of this estimator is: = 


50, — &) = s?(6) +5205) — 258, 5) (22.33) 


The estimated variance-covariance matrix of the regression coefficients, available in many 
regression package printouts, can then be used to obtain the required estimated variances 
and covariances. 


Chapter 22 Analysis of Covariance 939 


Comment 
Some computer packages for covariance analysis produce analyses that are only valid when all 
treatment sample sizes are equal. Computer packages should therefore be used with great care when 


the treatment sample sizes are unequal, to make sure that the package conducts the tests of interest. 
ш 


Б Additional Considerations for the Use of Covariance Analysis 


evariance Analysis as Alternative to Blocking 

At times, a choice exists between: (1) a completely randomized design, with covariance 
analysis used to reduce the experimental errors and (2) a randomized block design, with 
the blocks formed by means of the concomitant variable. Generally, the latter alternative is 
preferred. There are several reasons for this: 


СА 


3 1. If the regression between the response variable and the concomitant (blocking) vari- 
able is linear, a randomized block design and covariance analysis are about equally efficient. 
If the regression 1. not linear but covariance analysis with a linear relationship is utilized, 
covariance analysis with a completely randomized design will tend to be not as effective as 
a randomized block design. 

2. Randomized block designs are essentially free of assumptions about the nature of 

the relationship between the blocking variable and the response variable, while covariance 
: analysis assumes a definite form of relationship. 
: 3. Randomized block designs have somewhat fewer degrees of freedom available for 
. experimental error than with covariance analysis for a completely randomized design. How- 
E ever, in all but small-scale experiments, this difference in degrees of freedom has little effect 
: on the precision of the estimates. 


‘Use of Differences 

In a variety of studies, a prestudy observation X and a poststudy observation Y on the same 
variable are available for each unit. For instance, X may be the score for a subject’s attitude 
toward a company prior to reading its annual report, and Y may be the score after reading 
the report. In this situation, an obvious alternative to covariance analysis is to do an analysis 
of variance on the differences Y — X. Sometimes, Y — X is called an index of response 
because it makes one observation out of two. 

If the slope of the treatment regression lines is y = 1, analysis of covariance and analysis 
of variance on Y — X are essentially equivalent. When y = 1, covariance model (22.2) 
becomes: 


Yj =u. +G + Ху +; (22.34) 
which can be written as a regular analysis of variance model: 


Y. 


ij 


= Xij =u. +; + ё; (22.34а) 


Thus, if a unit change in X leads to about the same change in Y, it makes sense to 
perform an analysis of variance on Y — X rather than to use covariance analysis, because 


940 PartFive Multi-Factor Studies 


зм 


the analysis of variance model is a simpler model. If the regression slope is no 


: : : aoe t ne; 
however, covariance analysis may be substantially more effective than use of th arbo 
ences Y — X. © differ., 


In the earlier cracker promotion example, use of Y — X would have been effec 


: p ti 
would have yielded the error mean square MSE — 3.500, which is practically the © tt 
as the error mean square for covariance analysis, MSE = 3.506 (see Table 22.3b). Reval 
that the regression slope in our example is close to 1 (ў = .899). hence, the арго 


equivalence of the two procedures. 


Correction for Bias 


The suggestion is sometimes made that analysis of covariance can be helpful in correcting for 
bias with observational data. With such data, the groups under study may differ substantially 
with respect to a concomitant variable, and this may bias the comparisons of the groups. 
Consider, for instance, a study in which attitudes toward no-fault automobile insurance 
were compared for persons who are risk averse and persons who are risk seeking, It wag 
found that many persons in the risk-averse group tended to be older (50 to 70 years old), 
while many persons in the risk-seeking group tended to be younger (20 to 40 years old). 
In this type of situation, some researchers would advise that covariance analysis, with age 
as the concomitant variable, be employed to help remove any bias in the analysis of the 
observational data on attitudes toward no-fault insurance because the two age groups differ 
so much. 

Even though there is great appeal in the idea of removing bias in observational data, 
covariance analysis should be used with caution for this purpose. In the first place, com- 
parisons of means at a common value of X may require substantial extrapolation of the 
regression lines to a region where there are no or only few data points (in our example, 
to near 45 years). It may well be that the regression relationship used in the covariance 
analysis 1s not appropriate for substantial extrapolation. In the second place, the treatment 
variable may depend on the concomitant variable (or vice versa), which could affect the 
proper conclusions to be drawn. 


Interest in Nature of Treatment Effects 


Covariance analysis is sometimes employed for the principal purpose of shedding more 
light on the nature of the treatment effects, rather than merely for increasing the precision 
` of the analysis. For instance, a market researcher in a study of the effects of three different 
advertisements on the maximum price consumers are willing to pay for a new type of home 
siding may use covariance analysis, with value of the consumer's home as the concomitant 
variable. The reason is because the researcher is truly interested in the relation for each 
advertisement between home value and maximum price. Reduction of error variance in this 
instance may be a secondary consideration. k 
As in all regression analyses, care must be used in drawing inferences about the causal 
nature of the relation between the concomitant variable and the response. [n the advertising 
example. it might well be that value of a consumer's home is largely influenced by income. 
If this were so, the relation between value of the consumer’s home and maximum price the 
consumer is willing to pay may actually be largely a reflection of the underlying relation 
between income and maximum price. 


РЕЧ 


22.1. 


22.2. 


22.3. 


22.4. 


22.5. 


22.6. 


ж22.7. 


Chapter 22 Analysis of Covariance 941 


A student's reaction to the instructor's statement that covariance analysis is inappropriate 
when the treatment regression lines do not have the same slope was as follows: “It seems to 
me that this is ducking a real-world problem. If the treatment slopes are different, just use a 
covariance model that allows for different treatment slopes.” Evaluate this reaction. 
A survey analyst remarked: “When covariance analysis is used with survey data, there is a 
danger that the treatments may be related to the concomitant variable" What is the nature of 
the problem? Does this same problem exist when the treatments are randomly assigned to the 
experimental units? 
Portray, analogously to the format of Figure 1.6 on page 11 for a regression model, the nature 
of covariance model (22.3) when there are three treatments and the parameter values are: 
H- = 150, т = 15, 0 = —5, тз = —10, у = 6, X.= 70, с = 5. Show Several distributions 
of Y for each treatment. 
Refer to the cracker promotion example on page 926. A student stated, in discussing this case: 
“Strictly speaking, you cannot conclude anything about whether the three promotions differ 
in effectiveness because there was no control. The preceding period does not qualify as a 
contro] because it might have differed from the promotion period due to seasonal factors or 
other unique circumstances.” Comment. 
Refer to the cracker promotion example on pages 930 and 931, where three pairwise compar- 
isons of treatment effects were made by the Scheffé procedure. 
a. What would be the value of the Bonferroni multiple here for estimating the three 
comparisons? 
b. Did the analyst obtain substantially less precise interval estimates using the Scheffé pro- 
cedure, which permits making additional estimates without modifying the present ones? 
State the analysis of covariance model for a single-factor study with four treatments when 
there are two concomitant variables, each with linear and quadratic terms in the model. 
Refer to Productivity improvement Problem 16.7. The economist also has information on 
annual productivity improvement in the prior year and wishes to use this information as a 
concomitant variable. The data on the prior year's productivity improvement (X;;) follow. 


1 2 3 4 5 6 7 8 9 10 11 12 


1 82 79 70 57 72 70 65 79 63 
2 88 10.0 10.7 100 97 94 10.6 9.8 10.0 10.3 8.9 10.0 
3 11.5 122 128 11.0 123 12.1 


- 


а. Obtain the residuals for covariance model (22.3). 


b. For each treatment, plot the residuals against the fitted values. Also prepare a normal 
probability plot of the residuals and calculate the coefficient of correlation between the 
ordered residuals and their expected values under normality. What do you conclude from 
your analysis? 

c. State the generalized regression model to be employed for testing whether or not the 
treatment regression lines have the same slope. Conduct this test using œ = .01. State the 
alternatives, decision rule, and conclusion. What is the P-value of the test? 


d. Could you conduct a formal test here as to whether the regression functions are linear? If 


so, how many degrees of freedom are there for the denominator mean square in the test 
statistic? 


942 PartFive Multi-Factor Studies 


*22.8. Refer to Productivity improvement Problems 16.7 and 22.7. Assume that covariance 
(22.3) is appropriate. Mode] 


22.9. 


22.10. 


а. 


Prepare a symbolic scatter plot of the data. Does it appear that there are effects of 


: RR E. the 
of research and development expenditures on mean productivity improvement? level 


Discu 
2 А " Ss. 
State the regression model equivalent to covariance model (22.3) for this case; y "TM 


0 indicator variables. Also state the reduced regression model for testing for treatment 
effects. 
Fit the full and reduced regression models and test for treatment effects; use о — 05. State 
the alternatives, decision rule, and conclusion. What is the P-value of the test? 

Is MSE(F) for the covariance model substantially smaller than MSE for the analysis of 
variance model in Problem 16.7d? Does this affect the conclusion reached about treatment 
effects? Does it affect the P-value? 

Estimate the mean productivity improvement for firms with moderate research and de- 
velopment expenditures that had a prior productivity improvement of X = 9.0; use a 
95 percent confidence interval. 

Make all pairwise comparisons between the treatment effects; use either the Bonferroni or 
the Scheffé procedure with a 90 percent family confidence coefficient, whichever is more 
efficient. State your findings. 


Refer to Questionnaire color Problem 16.8. It has been suggested to the investigator that size 
of parking lot might be a useful concomitant variable. The number of spaces (Хуу) in each 
parking lot utilized in the study follow. 


j 
i 1 2 3 4 5 
1 300 381 226 350 100 
2 153 334 473 264 325 
3 144 359 296 243 252 


. Obtain the residuals for covariance model (22.3). 


. For each treatment, plot the residuals against the fitted values. Also prepare a normal 


probability plot of the residuals and calculate the coefficient of correlation between the 
ordered residuals and their expected values under normality. What do you conclude from 
your analysis? 


. State the generalized regression model to be employed for testing whether or not the 


treatment regression lines have the same slope. Conduct this test using о = .005. State the 
alternatives, decision rule. and conclusion. What is the P-value of the test? 


. Could you conduct a formal test here as to whether the regression functions are linear? 


Explain. 


Refer to Questionnaire color Problems 16.8 and 22.9. Assume that covariance model (22.3) 
is applicable. А 


а. 


Prepare a symbolic scatter plot of the data. Does it appear that there are color effects on 
the mean response rate? Discuss. 

State the regression model equivalent to covariance model (22.3) for this case; use 1, —1, 
0 indicator variables. Also state the reduced regression model for testing for treatment 
effects. 

Fit the full and reduced regression models and test for treatment effects; use a = . 10. State 
the alternatives, decision rule, and conclusion. What is the P-value of the test? 


22.11. 


22.12. 


22.13. 


Chapter 22 Analysis of Covariance 943 


. Is MSE(F) for the covariance model substantially smaller than MSE for the analysis of 


variance model in Problem 16.8d? How does this affect the conclusion reached about 
treatment effects? 

Estimate the mean response rate for blue questionnaires in parking lots of size X = 280; 
use a 90 percent confidence interval. 


. Make all pairwise comparisons between the treatment effects; use either the Bonferroni or 


the Scheffé procedure with a 90 percent family confidence coefficient, whichever is more 
efficient. State your findings. 


Refer to Rehabilitation therapy Problem 16.9. The rehabilitation researcher wishes to use 
age of patient as a concomitant variable. The ages (X;;) of patients in the study follow. 


i 1 2 3 4 5 6 7 8 9 10 
1 183 30.0 265 281 297 278 198 29.3 

2 208 252 292 200 215 221 197 247 202 229 
3 227 287 189 180 217 20.0 


. Obtain the residuals for covariance model (22.3). 
. For each treatment, plot the residuals against the fitted values. Also prepare a normal 


probability plot of the residuals and calculate the coefficient of correlation between the 
ordered residuals and their expected values under normality. What do you conclude from 
your analysis? 


. State the generalized regression model to be employed for testing whether or not the 


treatment regression lines have the same slope. Conduct this test using œ = .05. State the 
alternatives, decision rule, and conclusion. What is the P-value of the test? 


. Could you conduct a formal test here as to whether the regression functions are linear? 


Explain. 


Refer to Rebabilitation therapy Problems 16.9 and 22.11. Assume that covariance model 
(22.3) is applicable. 


a. 


Prepare a symbolic scatter plot of the data. Does it appear that there are effects of physical 
fitness status on the mean number of days required for therapy? Discuss. 

State the regression model equivalent to covariance model (22.3) for this case; use 1, —1, 
O indicator variables. Also state the reduced regression model for testing for treatment 
effects. 

Fit the full and reduced regression models and test for treatment effects; use о = .01. State 
the alternatives, decision rule, and conclusion. What is the P-value of the test? 


. Is MSE(F) for the covariance model substantially smaller than MSE for the analysis of 


variance model in Problem 16.9d? Does this affect the conclusion reached about treatment 
effects? Does it affect the P-value? 

Estimate the mean number of days required for therapy for patients of average physical 
fitness and age 24 years; use a 99 percent confidence interval. 


. Make all pairwise comparisons between the treatment effects; use either the Bonferroni or 


the Scheffé procedure with a 95 percent family confidence coefficient, whichever is more 
efficient. State your findings. 


Product display. A manufacturer of felt-tip markers investigated by an experiment whether 
8 proposed new display, featuring a picture of a physician, is more effective in drugstores 


944 Part Five Multi-Factor Studies 


22.14. 


than the present counter display, featuring a picture of an athlete and desig 
in the stationery area. Fifteen drugstores of similar characteristics were chosen for the sid 

They were assigned at random in equal numbers ю one of the following three treatmen y: 
(1) present counter display in stationery area, (2) new display їп stationery area, (3) new dis > 
in checkout area. Sales with the present display (X;;) were recorded in all 15 stores fora MK 
week period. Then the new display was set up in the 10 stores receiving it, and sales fo oi 


ned to be locateg 


К : rthe 
three-week period (Y;;) were recorded in all 15 stores. The data on sales (in dollars) folo. 
] 

i 1 2 3 4 5 
Treatment 1 

First 3 weeks 92 68 74 52 65 

Second 3 weeks 69 44 58 38 54 
Treatment 2 

First 3 weeks 77 80 70 73 79 


Second 3 weeks 74 75 73 78 82 


Treatment 3 
First 3 weeks 64 43 81 68 71 
Second 3 weeks 66 49 84 75 77 


The analyst wishes to analyze the effects of the three different display treatments by means 
of covariance analysis. 


a. Obtain the residuals for covariance model (22.3). 


b. Foreach treatment, plot the residuals against the fitted values. Also prepare a normal prob- 
ability plot of the residuals and calculate the coefficient of correlation between the ordered 
residuals and their expected values under normality. What do you conclude from your 
analysis? 

c. State the generalized regression model to be employed for testing whether or not the 
treatment regression lines have the same slope. Conduct this test using œ = .05. State the 
alternatives, decision rule, and conclusion. What is the P-value of the test? 

d. Could you conduct a formal test here as to whether the regression functions are linear? 
Explain. 


Refer to Product display Problem 22.13. Assume that covariance model (22.3) is applicable. 

a. Prepare a symbolic scatter plot of the data. Does it appear that there are display effects on 
mean sales? Discuss. 

b. State the regression model equivalent to covariance model (22.3) for this case; use 1, —1,0 
indicator variables. Also state the reduced regression model for testing for treatment effects. 

c. Fit the full and reduced regression models and test for treatment effects: use о = .05. State 
the alternatives, decision rule, and conclusion. What is the P-value of the test? 

d. ls MSE(F) for the covariance model substantially smaller than the mean square error if 
analysis of variance model (16.2) had been employed? 

e. Estimate the mean sales with display treatment 2 for stores whose sales in the preceding 
three-week period were $75: use a 95 percent confidence interval. 

f. Make all pairwise comparisons between the treatment effects: use either the Bonferroni or 
the Scheffé procedure with a 90 percent family confidence coefficient, whichever is more 
efficient. State your findings. 


Veo TASA л 


Chapter 22 Analysis of Covariance 945 


22.15. Refer to Cash offers Problem 19.10. An analyst wishes to use each dealer's sales volume as 
a concomitant variable. The sales data (X;j4, in hundred thousand dollars) follow. 


i —1 i=2 i—3 
j21 j=2 j=1 j=2 j=1 j=2 
А 3.0 3.5 6.5 2.2 5.0 4.0 
М 5.1 42 4.1 5.4 3.1 8 
4.9 6.6 3.0 5.0 2.9 1.9 


а. Obtain the residuals for covariance model (22.26). 

b. Foreach treatment, plot the residuals against the fitted values. Also prepare a normal prob- 
ability plot of the residuals and calculate the coefficient of correlation between the ordered 
residuals and their expected values under normality. What do you conclude from your 
analysis? 

C. State the generalized regression model to be employed for testing whether or not the 
treatment regression lines have the same slope. Conduct this test using œ = .01. State the 
alternatives, decision rule, and conclusion. What is the P-value of the test? 

22.16. Refer to Cash offers Problems 19.10 and 22.15. Assume that covariance model (22.26) is 
applicable. 

a. State the regression model equivalent to covariance model (22.26) for this case; use 1, —1, 
0 indicator variables. Fit this full model. 

а b. State the reduced regression models for testing for interaction and factor A and factor B 
main effects, respectively. Fit these reduced regression models. 

c. Test for interaction effects; use œ = .05. State the alternatives, decision rule, and conclu- 
sion. What is the P-value of the test? 

d. Test for factor A main effects; use œ = .05. State the alternatives, decision rule, and con- 
clusion. Whar is the P-value of the test? 

€. Test for factor B main effects; use œ = .05. State the alternatives, decision rule, and con- 
clusion. What is the P-value of the test? 

f. For each factor, make all pairwise comparisons between the factor level main effects. 
Use the Bonferroni procedure with a 90 percent family confidence coefficient. State your 
findings. 

22.17. Refer to Eye contact effect Problem 19.12. Age of personnel officer is to be used as a con- 
comitant variable. The ages (Хк) of the personnel officers follow. 


i=1 i=2 
j=l j=2 j=1 j=2 
42 51 43 42 
30 35 53 47 
35 49 49 56 


a. Obtain the residuals for covariance model (22.26). 


b. For each treatment, plot the residuals against the fitted values. Also prepare a normal prob- 
ability plot of the residuals and calculate the coefficient of correlation between the ordered 


946 PartFive Multi-Factor Siudies 


22.18, Referto Eye contact effect Problems 19.12 and 22.17. Assume that covari 
ts applicable, 


*22.19. 


а. 


g. 


. State the generalized regression model to be employed for testing whether ог n 


residuals and their expected values under normality. What do you conclude f 
analysis? Тош your. 


Ot the trea: 


ment regression lines have the same slope. Conduct this test using a = 005. State a. 
, "Hie the 


alternatives, decision rule, and conclusion. What is the P-value of the test? 


ance model (22.26) 


State the regression model equivalent to covariance model (22.26) for this case. 


MNT Я А E ; use 1,—1. 
0 indicator variables, Fit this full model, 


State the reduced regression models for testing for interaction and factor A апа factor В 
main effects, respectively. Fit these reduced regression models. 

Test for interaction effects; use о = ‚01, State the alternatives, decision rule, and conclu 

sion, What is the P-value of the test? 

Test for factor A main effects; use œ = .01, State the alternatives, decision rule, ang con 

clusion. What is the P-value of the test? 

Test for factor B main effects; use а = .01. State the alternatives, decision rule, and con. 
clusion. What is the P-value of the test? 

Compare the gender main effects by means of a 99 percent confidence interval, Interpret 
your interval estimate, 

Estimate the mean success rating by female personnel officers aged 40 when eye contact 
is present; use a 99 percent confidence interval. 


Refer to Auditor training Problem 21.5. The analyst wishes to examine whether use of 
pretraining statistical proficiency scores as a concomitant variable would help to reduce the 
experimental error variability significantly, The pretraining statistical proficiency scores for 
the auditors are as follows: 


Training Training 
Block Method (j) Block Method (j) 
I 1 2 3 i 1 2 3 
1 93 98 91 6 75 74 78 
2 94 93 94 7 79 76 72 
3 89 91 92 8 71 69 64 
4 86 84 90 9 74 71 70 
5 78 76 84 10 63 68 64 


. Would you expect the auditor's age to have been a better concomitant variable here than 


the pretraining statistical proficiency score? Discuss, 


. State the regression model equivalent to covariance model (22.29); use 1, —1, 0 indicator 


variables, 
a 


. Fit the full regression model. 


d. State the reduced regression model for testing treatment effects. Fit the reduced model. 


e, Test whether or not the training methods differ in mean effectiveness; use 0 = ‚05. State 


the alternatives, decision rule, and conclusion. What is the P-value of the test? 


. Obtain a 95 percent confidence interval for L = ту — т». Interpret your interval estimate. 


Е $ В - sable? 
. Has the error variance been reduced substantially by adding the concomitant variable! 


Explain. 


DT 


22.20. 


22.21. 


22.22. 


Chapter 22 Analysis of Covariance 947 


Refer to Fat in diets Problem 21.7. The researcher wishes to examine whether each subject’s 
body weight expressed as a percent of the ideal weight for that person would be a useful 
concomitant variable. The body weights as percents of the ideal weights for the 15 subjects 
are as follows: 


Fat Content of Diet 


Block Е ьа а ЕРИК 
і j=1 j=2 j=3 
1 94 96 101 
2 97 102 99 
3 105 100 106 
4 108 107 112 
5 118 115 107 


a. State the regression model equivalent to covariance model (22.29); use 1, —1, O indicator 
variables. 


b. Fit the full regression model. 
. State the reduced regression model for testing treatment effects. Fit the reduced model. 


a о 


. Test whether or not the mean reductions in lipid level differ for the three diets; use о = .05. 
State the alternatives, decision rule, and conclusion. What is the P-value of the test? 

€. Obtain confidence intervals for Lı = тү — t апа Lz = v; — 13, using the Bonferroni pro- 
cedure with a 95 percent family confidence coefficient. Interpret your interval estimates. 

f. Has the error variance been reduced substantially by adding the concomitant variable? 
Explain. 

Refer to Productivity improvement Problems 22.7 and 22.8. The analyst is considering the 

use of the difference between the productivity improvements іп the two years (Y;; — X;j) as 

the response variable with the regular analysis of variance model (22.292). 

a. Obtain the analysis of variance table. 

b. How effective here is the use of differences with the regular ANOVA model compared to 
the use of covariance model (22.3)? Discuss. 

Refer to Product display Problems 22.13 and 22.14. The analyst is considering the use of 

the difference in sales between the two periods (Y;; — Ху) as the response variable with the 

regular analysis of variance model (22.292). 

a. Obtain the analysis of variance table. 

b. How effective here is the use of differences with the regular ANOVA model compared to 

the use of covariance model (22.3)? Discuss. 


Exercise 22.23. 


Projects 


(Calculus needed.) Denote u. + т; in covariance model (22.3) by A;. Derive the least squares 
estimators for A; and y in covariance model (22.3). 


22.24. 


Refer to the SENIC data set in Appendix C.1. The following hospitals are to be considered 
in a study of the effects of region (variable 9) on the mean length of hospital stay of patients 
(variable 2), with available facilities and services (variable 12) as a concomitant variable: 


1-52 54 55 57 58 63 .76 83 84 94 101 103 111 


948 Part Five Multi-Factor Studies 


a. Obtain the residuals for covariance model (22.3). i 


b. For each region, plot the residuals against the fitted values. Also prepare а normal probate 
ity plot of the residuals and calculate the coefficient of correlation between the и, : 
uals and their expected values under normality. What do you conclude from your Nis: 

‘с. State the generalized regression model to be employed for testing whether or not the ушр 
ment regression lines have the same slope. Conduct this test using а = 005. State a 


alternatives, decision rule, and conclusion. What is the P-value of the test? 


22.25. Refer to the SENIC data set in Appendix С.І and Project 22.24. Assume th 


at соуап 
model (22,3) is applicable. variance 


a. Prepare a symbolic scatter plot of the data. Does it appear that there are region effects 
the mean length of hospital stay? Discuss. о 

b. State the regression model equivalent to covariance model (22.3) for this case; use 
0 indicator variables. Also state the reduced regression model for testing 


effects, 


1,1 
for treatment 


с. Fit the full and reduced regression models and test for treatment effects; use о = 05. State 
the alternatives, decision rule, and conclusion. What is the P-value of the test? 

d. Make all pairwise comparisons between the region effects; use either the Bonferroni or 
the Scheffé procedure with a 90 percent family confidence coefficient, whichever 15 more 
efficient. State your findings. 

22.26. Refer to the Market share data set in Appendix C.3 and Project 16.45. Use price (variable 3) 
as a concomitant variable. 

a. Obtain the residuals for covariance model (22.3). 

b. For each treatment, plot the residuals against the fitted values, Also prepare a normal 
probability plot of the residuals and calculate the coefficient of conelation between the 
ordered residuals and their expected values under normality. What do you conclude from 
your analysis? 

c. State the generalized regression model to be employed for testing whether or not the 
treatment regression lines have the same slope. Conduct this test using о = .05, State the 
alternatives, decision rule, and conclusion. What is the P-value of the test? 

d. Could you conduct a formal test here as to whether the regression functions are linear? 
Explain. 


22.27. Refer to the Market share data set in Appendix C.3 and Project 22.26. 


a. Prepare a symbolic scatter plot of the data. Does it appear that mean monthly market 
share changes with the discount price and package promotion factor-level combinations? 
x Discuss. 


T 


State the regression model equivalent to covariance model (22.3) for this case; use 1, —1, 
0 indicator variables. Also state the reduced regression model for testing for treatment 
effects. | 

c. Fit the full and reduced regression models and test for treatment effects; use о = .01. State 
the alternatives, decision rule, and conclusion. What is the P-value of the test? 

d. Is MSE (F) for the covariance model substantially smaller than MSE for the analysis of 
variance model in Project 16.45? Does this affect the conclusion reached about treatment 
effects? Does it affect the P-value? 

e. Estimate the average monthly market share for product with discount price present, package 

promotion absent, and average monthly price of product 2,5: use a 99 percent confidence 

interval, 


= 


Chapter 22 Analysis of Covariance 949 


f. Make all pairwise comparisons between the treatment effects; use either the Bonferroni or 
the Scheffé procedure with a 95 percent family confidence coefficient, whichever is more 
efficient. State your findings. 


22.28. Referto the CDI data set in Appendix C.2 and Project 19.53. The metropolitan areas identified 
in Project 19.53 are to be considered in a study of the effects of region (factor A: variable 17) 
and percent below poverty level (factor B: variable 13) on crime rate (variable 10 —- variable 5), 
with percent of population 65 or older (variable 7) as a concomitant variable. For purposes “| 
of this analysis of covariance study, percent below poverty level is to be classified into two | | 
categories: less than 8.0 percent, and 8.0 percent or more. D 


a. Obtain the residuals for covariance model (22.26). 

b. For each treatment, plot the residuals against the fitted values. Also prepare a normal 
probability plot of the residuals and calculate the coefficient of correlation between the 
ordered residuals and their expected values under normality. What do you conclude from 
your analysis? 

c. State the generalized regression model to be employed for testing whether or not the 
treatment regression lines have the same slope. Conduct this test using œ = .001. State the 
alternatives, decision rule, and conclusion. What is the P-value of the test? 

22.29. Refer to the CDI data set in Appendix C.2 and Project 22.28. Assume that covariance model 

(22.26) is applicable. 

a. State the regression model equivalent to covariance model (22.26) for this case; use 1, —1, i 
0 indicator variables. Fit this full model. | 

b. State the reduced regression models for testing for interaction and factor А and factor B і 
main effects, respectively. Fit these reduced regression models. | 

с. Test for interaction effects; use œ = .01. State the alternatives, decision rule, and conclu- i 
sion. What is the P-value of the test? | 

d. Test for factor A main effects; use œ = .01. State the alternatives, decision rule, and con- 
clusion. What is the P-value of the test? 

е. Test for factor B main effects; use œ = .01. State the alternatives, decision rule, and con- 
clusion. What is the P-value of the test? 

22.30. Refer to the Market share data set in Appendix C.3 and Project 19.55. Use price (variable 3) 
as a concomitant variable. 

a. Obtain the residuals for covariance model (22.26). 


b. For each treatment, plot the residuals against the fitted values. Also prepare a normal 
probability plot of the residuals and calculate the coefficient of correlation between the 
ordered residuals and their expected values under normality. What do you conclude from 
your analysis? 

C. State the generalized regression model to be employed for testing whether or not the 
treatment regression lines have the same slope. Conduct this test using о = .05. State the 
alternatives, decision rule, and conclusion. What is the P-value of the test? 


22.31. Refer to the Market share data set in Appendix C.3 and Project 22.30. 
a. State the regression model equivalent to covariance model (22.26) for this case; use 1, —1, 
0 indicator variables. Fit this full model. 


b. State the reduced regression models for testing for interaction and factor A and factor B 
main effects, respectively. Fit these reduced regression models. 


с. Test for interaction effects; use о = .01. State the alternatives, decision rule, and conclu- 
sion. What is the P-value of the test? 


950 Part Five  Multi-Factor Studies 


d. Test for Гасіог A main effects; use а = .01. State the alternatives, decision тше 
clusion. What is the P-value of the test? > and con: 
e. Test lor facior B main effects: use a = .01. State the alternatives, decision rule, ang 
d Сор: 


clusion. What is the P-value of the test? 


Case 
Studies 


22.32. 


22.33. 


22.34. 


22.35. 


22.36. 


Refer to the Prostate cancer data set in Appendix C.5 and Case Study 16.49. Cany У 
one-way analysis of covariance of this data set, where the response of interest js PSA ld 
(variable 2), the single factor is Gleason score (variable 9), and the possible covariates и 
cancer volume (variable 3) and weight (variable 4), The analysis should consider transforms. 
tions of the response variable and the covariates. Document steps taken in your analysis, d 


justify your conclusions, 

Refer to the Real estate sales data set in Appendix C.7 and Case Study 16,50, Carry outa 
one-way analysis of covariance of this data set, where the response of interest is sales price 
(variable 2), the single factor is number of bedrooms (variable 4), and the possible COVariafes 
are finished square feet (variable 3) and lot size (variable 12). Recode the number of bedrooms 
into four categories: 0-2, 3, 4, and greater than or equal to 5. The analysis should Consider 
transformations of the response variable and the covariates. Document steps taken in your 
analysis. and justify your conclusions. 

Refer to the Ischemic heart disease data set in Appendix C.9 and Case Study 16.5], Carry out 
a one-way analysis of covariance of this data set, where the response of interest is total cost 
(variable 2). the single factor is total number of interventions (variable 5), and the possible 
covariates are duration (variable 10) and age (variable 3). Recode the number of interventions 
into six categories: 0, 1, 2, 3-4. 5—7. and greater than or equal to 8. The analysis should 
consider transformations of the response variable and the covariates. Document Steps taken 
in your analysis, and justify your conclusions. 

Refer to the Real estate sales data set in Appendix С.7 and Case Study 19.59. Carry outa 
balanced two-way analysis of covariance of this data set where the response of interest is sales 
price (variable 2), the two crossed factors are quality (variable 10) and style (variable 11), and 
the possible covariates are finished square feet (variable 3) and lot size (variable 12). Style 
is recoded as either | or not 1. Order the observations in the six factor-level-combination 
cells from smallest to largest observation number and retain the first 25 observations in each 
cell for a total of 150 observations. The analysis should consider transformations of the 
response variable and the covariates. Document the steps taken in your analysis and justify 
your conclusions. 


Refer to the Ischemic heart disease data set in Appendix C.9 and Case Study 16.60. Carry 
out a balanced two-way analysis of covariance of this data set where the response of interest 
is total cost (variable 2), the two crossed factors are number of interventions (variable 5) 
and number of comorbidities (variable 9). and the possible covariates are duration (vari- 
able 10) and age (variable 3). Recode the number of interventions into six categories: 0, 1, 2, 
3-4, 5-7, and greater than or equal to 8. Recode the number of comorbidities into two 
categories: 0-1, and greater than or equal to 2. Order the observations in the twelve factor- 
level-combination cells from smallest to largest observation number and retain the first 43 
observations in each cell for a total of 516 observations. The analysis should consider trans- 
formations of the response variable and the covariates. Document the steps taken in your 
analysis and justify your conclusions. 


93.1 Unequal Sample Sizes 


дА 


к 


à 


а 


apter 7 4 e) 


‘Two-F actor Studies with 
Unequal Sample Sizes 


Up to this point in our discussion of two-factor studies we have restricted ourselves to equal 
treatment sample sizes for the two-factor ANOVA model (19.23). Often, however, two- 
factor studies involve unequal treatment sample sizes. The resulting imbalance destroys the 
orthogonality of the ANOVA decomposition. Consequently, the general linear test approach 
is utilized for ANOVA tests. In Sections 23.1 through 23.4 we shall take up procedures for 
handling two-factor studies with unequal treatment sample sizes, We continue to assume 
that all treatment means are of equal importance in these sections. 

In occasional ANOVA studies, the treatment means are not of equal importance. This 
also makes the standard ANOVA decomposition inappropriate, and the general linear test 
approach consequently is employed. We consider in Section 23.5 procedures for conducting 
the analysis of variance when the treatment means have unequal importance. We conclude 
this chapter by discussing briefly in Section 23.6 the use of statistical computing packages 
in the presence of unequal treatment sample sizes, | 


Two-factor studies frequently involve unequal treatment or cell sample sizes for a variety 
of reasons. In observational studies, the investigator often has little or no control over the 
cell sample sizes. For example, in a comparative study of U.S. manufacturing practices, 
researchers examined the performance of manufacturing plants as a function of size of plant 
(factor A: small, medium, large) and ownership (factor B: Japan, United States). In this 
two-factor study, cell sample sizes for the six treatments were not under the complete control 
of the researchers. First, the number of plants available for study in each size-ownership 
category varied. Second, many plants were unable or unwilling to participate in the study. 

Unequal treatment sample sizes are also encountered in experimental studies, For in- 
stance, an experimenter may seek to have the same number of cases for each treatment, 
but for a variety of reasons (e.g., illness of subject, incomplete records, technical problems) 
ends up with unequal cell sample sizes. 

Another reason for unequal treatment sample sizes is that investigators in both observa- 
tional and experimental studies may use larger sample sizes for treatments for which the cost 


951 


952 PartFive Multi-Factor Studies 


Notation 


is lower. In still other instances, unequal treatment sample sizes may be desired to enable 
certain treatment means or certain linear combinations of treatment means to be estimated 
with greater precision. For example, a packaged foods manufacturer wished to measure the 
impact on consumer product ratings of a change from corn syrup to a low-calorie Sweetener 
(factor A) in one of its breakfast cereals. Three categories of consumers, (factor B: children, 
female adults, and male adults) were considered to be important. It was known that about 
60 percent of the consumers are children, 20 percent are adult males, and 20 percent ате 
adult females. It was therefore considered to be reasonable to require that 60 percent of the 
subjects be children, 20 percent be adult males, and 20 percent be adult females to provide 
greater precision for the most important consumer group. 

The fact that treatment sample sizes are unequal often does not affect the importance 
of the treatment means. As we just noted, sample sizes frequently are unequal for reasons 
that have nothing to do with the importance of treatment means. In our discussion of 
unequal treatment sample sizes in Sections 23.2—23.4, we shall continue to assume that а 
treatment means have the same importance. Procedures for handling ANOVA inferences 
when treatments have unequal importance are considered in Section 23.5. 

Throughout Sections 23.1—23.3, we assume that there is at least one case for each treat- 
ment, Techniques for the analysis of studies with one or more cells empty are discussed in 
Section 23.4. 


Our notation remains the same as before, except that the sample size for the treatment 
consisting of the ith level of factor A and the jth level of factor B will now be denoted by 
nij. The total number of cases for the ith level of factor A will be denoted by: 


hj. = 55, hij (23.1a) 
j 
the total number of cases for the jth level of factor B by: 
п. = т (23.16) 
and the total number of cases for the entire study by: 


пт=ў X ny (23.19) 


The estimated treatment mean when factor A is at the ith level and factor B is at the jth 


level is defined as usual: 


M у. = Zi (23.2) 
where: 


Үр. = Y (23.22) 
k=1 


Chapter 23 Two-Factor Studies with Unequal Sample Sizes 953 


232 Use of Regression Approach for Testing Factor Effects when 


& 


1E 


B 


ж 


Sample Sizes Ате Unequal 


When the treatment sample sizes are unequal, the analysis of variance for two-factor studies 
becomes more complex. The least squares equations are no longer of a simple structure that 
yields direct and easy solutions, and the regular analysis of variance formulas in (19.37) and 
(19.39) are now inappropriate. Furthermore, the factor effect component sums of squares 
are no longer orthogonal; that is, they do not sum to SSTR. 

Hence, we will utilize the general linear test approach described in Section 2.8 when 
the treatment sample sizes are unequal. An easy way to obtain the proper error sums of 
squares for testing factor interactions and main effects by the general linear test approach 
is through the regression formulation of the ANOVA model described below. The only 
difference when cell sample sizes are unequal is that a reduced regression model needs to 
be fitted explicitly for each test of factor interactions and main effects because of the lack 
of orthogonality. Since no new principles are involved, we turn directly to an example to 
illustrate how ANOVA tests are conducted by means of the regression approach when the 
treatment sample sizes are unequal. 


Regression Approach to Two-Factor Analysis of Variance 


We shall explain the regression approach to two-factor analysis of variance in terms of the 
factor effects model (19.23): 


Yi = H.. t о + By + (08): + Eijk (23.3) 
As we know from (19.24), the mean responses for this model are given by: 
Ек} = p. +0; + Bj + (еб) (23.4) 


То represent this model in matrix terms, we proceed in the same fashion as in the regres- 
sion approach to single-factor ANOVA. Since $ ^o; = 0, we need only a — 1 parameters œ; 
in the regression model, and we represent the parameter a, as follows: 


Qa = —0, —05 — +++ — 04.4 (23.5) 


Hence, we utilize a — 1 indicator variables that can take on values 1, —1, or O for the a; 
parameters, as in the single-factor ANOVA representation. Similarly, we need only b — 1 
parameters В; in the regression model, and we represent the parameter В, as follows: 


By = —Bi — В — +++ — fea (23.6) 


Hence, we utilize b — 1 indicator variables that can take on values 1, —1, or 0 for the В; 
parameters. 
For the interaction parameters, we need to recognize that: 


Xe =0 j=1,...,b 


(23.7) 
3 оё = 0 i —]1,...,a 
j 


954 PartFive Multi-Factoc Studies 


Example 


TABLE 23.1 
Sample Data 
and Notation— 
Growth 
Hormone 
Example 
(growth rate 
difference in 
centimeters per 
month). 


Therefore, we represent the parameters (08): and (o),; as follows: 
(yB)is = —(oB)i — (p) —--- — (В) (23.8) 
(eB); = =P); — (B); — +> — (XB) (23.9) 


Indeed, because of the interrelations in the constraints in (23.7), only (a — I)(b — 1) terms 
(o £);; are needed in the regression model. As we shall demonstrate below, these are Precisely 
the terms associated with the cross products between the indicator variables for the factor A 
and factor B main effects. We turn now to an example to illustrate how ANOVA tests are 
conducted by means of the regression approach when the treatment sample sizes are unequal. 


Synthetic growth hormone was administered ata clinical research center to growth hormone 
deficient, short children who had not yet reached puberty. The investigator was interested 
in the effects of a child's gender (factor A) and bone development (factor B) on the rate of 
growth induced by hormone administration. A child's bone development was classified into 
one of three categories: severely depressed, moderately depressed, mildly depressed. Three 
children were randomly selected for each gender-bone development group. The response 
variable (Y) of interest was the difference between the growth rate during growth hormone 
treatment and the normal growth rate prior to the treatment, expressed in centimeters per 
month. Four of the 18 children were unable to complete the year-long study, thus creating 
unequal treatment sample sizes. Note that this is an observational study. АП children received 
the same hormone therapy, and, subsequently, changes in growth rates were observed for 
children in each bone development-by-gender category. No randomization of treatments to 
subjects was employed. 

Table 23.1 presents the study data. A plot of the estimated treatment means is shown 
in Figure 23.1. It is clearly suggested there that a child's bone development has a major 
impact on the change in growth rate. The plot also raises the questions as to whether some 
interaction effects are present and whether the gender of a child affects the growth rate. 

То test formally whether or not these factor effects are present, we utilize the general 
linear test approach and the equivalent regression model formulation because of the unequal 
sample sizes. 


Bone Development (factor B) 


j 
Gender (factor А) Severely Moderately Mildly 
i Depressed (B1) Depressed (B2) Depressed (Вз) 
Male (A1) 1.4 (Yi) 2.1 (Yizi) 7 (Yi) 
2.4 (Yiiz) 1.7 (Yiz2) 1.1 (%32) 
Mean 2.0 (Yq1-) 1.9 (Yi. 9 Ha) 
Female (Az) 2.4 (Үл) 2.5 (Y221) .5 (Y231) 
1.8 (Y222) .9 (¥232) 
2.0 (Y223) 1.3 (Уззз) 


Mean 2.4 (Yn-) 2.1 (Y22-) 9 (Үз.) 


FIGURE 23.1 
gstimated 
‘qreatment 
Means 
plot—Growth 


Chapter 23  Two-Factor Studies with Unequal Sample Sizes 955 


N 
in sl 


Female children 


N 
© 


Change in Growth Rate 
n 


= 
© 


Severely Moderately Mildly 
depressed depressed depressed 
Bone Development 


Development of Regression Model. The two-factor ANOVA model (19.23) here is: 
Үд = ш. о + В; + (06); + ei — i—152;j =1,2,3 (23.10) 


To express this model in regression terms, we utilize indicator variables that take on the 
values 1, —1, or 0, as explained below. Specifically, we need a — 1 = 2 — 1 = 1 indicator 
variable for the factor A main effects and b — 1 = 3 — 1 = 2 indicator variables for the 
factor B main effects. The interaction terms correspond to the cross products of the indicator 
variables for factor A and factor B main effects. Specifically, the regression model equivalent 
to ANOVA model (23.10) is: 


Yijk = ш. + Xij + В. Хао + BoX izes 
—— Se 
A main effect B main effect 


+ (98). X а Хик + (08) Хиуа Хаз + Eijk Fullmodel (23.11) 
SS Se 


AB interaction effect 


where: 


X= l if case from level 1 for factor A 
! —] ifcase from level 2 for factor A 


1 if case from level 1 for factor B 
X2 = 4 —1 if case from level 3 for factor B 
O otherwise 


if case from level 2 for factor B 
Хз = 4 —1 if case from level 3 for factor B 
otherwise 


956 PartFive Multi-Factor Studies 


a 


TABLE 23.2 
Data for 
Regression 
Fits—Growth 
Hormone 
Example. 


The regression coefficients in (23.11) are the ANOVA model parameters: 


[ 

Oy = ш. — џ.. 

Bi = ш. = h- 

В = h2 — и.. (23.12) 


(08). = Mu — ш. — Ba F H 
(0В),2 = ui — Hi- — шә + М. 


The remaining ANOVA model parameters are not required in the regression model because 
of the constraints in (19.23). Thus, for instance: 


02 = —aQ, 

b = -Bi - B 
(ap) = —(aB) – (of) (23.13) 
(aß) = —(«Й8)\, 


Table 23.2 repeats in column 1 a portion of the response data from Table 23.1. The 
codings of the indicator variables and the interaction terms are shown in columns 2-6. 
Note, for instance, that the codings for the first male child whose bone development js 
severely depressed (i = 1, j = 1,k = 1) are X, = 1, Х = 1, and Хз = 0, so that 
X, X; = land X, Хз = 0. Table 23.3a presents the fitted regression function and regression 
ANOVA table when the full regression model (23.11) is fitted to the data, i.e., when Y in 
column 1 of Table 23.2 is regressed on the X variables in columns 2—6. Note that the fitted 
values for the full model are the estimated treatment means Y; jv, Just as when all treatment 
sample sizes are equal. For instance, we have for the first case (k — 1) from treatment 
i2lj-k 


Êi = 1.7 — (0 + .5(1) + .3(0) — .1(1) — 0(0) = 2.0 = Y. 
and for the last case (k = 3) from treatment i = 2, j = 3: 


153 = 1.7 — 1(-1) -.5(-D + .3(—1) — 1(0)) — 001) = .9 = Ya. 


0 о O @% (5) (6) 
i j k Y Хх X» А ХХ, ХХ 


1 1 1 1.4 1 1 0 1 0 
1 1 2 2.4 1 1 0 1 *0 
1 2 1.7 1 0 1 0 1 
1 3 1 7 1 -1 -1 -1 =] 
2 3 2 9 -1 -1 -1 1 1 
2 3 3 13  -1 -1 —1 1 1 


S Chapter 23  Two-Factor Studies with Unequal Sample Sizes 957 


Ге 23. 3 Fits of Full and Reduced Regression Models—Growth Hormone Example. 


1" ^ (ау Full Model: (23.11) 
55 df f 1.7 – -1X1 +.5Х 4-:3Xs.— AX; X2 — 0.0X1 Xs 
4.4743 5 
1.3000 8 
5.7743 13 


(b) Reduced Model (23. 15) 


5$ df © 1 = 1.68 — .0857 X1 +.467Х + -327 X3 
4.3989 3 
1.3754 10 
5.7743 13 SSE(R) — SSE(F) = 1.3754 —1.3000 = .0754 


(c) Reduced Model (23.17) 


SS df f = 1.69 + .444X; + .32BX3— .0667 X Хә — .0167 X1 Xs 
4.3543 4 i 
1.4200 9 
5.7743 13 SSECR).— SSE(F)'= 1.4200.— 1.3000 = 1200: 

(d) Reduced Model (23.18) 

55 df f = 1.63 + .0190X; + .0667X X2 — .193X1 Хз 
0.2846. 3 
5.4897 10 
5.7743 13 SSE(R) — SSE(F) = 5.4897 —.1.3000 = 4.1897 


Test for Interaction Effects. То test whether or not interaction effects are present, the 
ANOVA model alternatives: 


Ho: all (og); =0 


(23.14) 
Ha: not all (@B);; equal zero 
become for regression model (23.11): 
Ho: (a = (Q —0 
o: (98) = (o 8)i2 (23.142) 


Ha: not both (&f); and (f)? equal zero 


958 PartFive Maulti-Fuctor Studies 


Thus, we are simply testing whether or not two regression coefficients equa] Zero, Th 
reduced regression model therefore is: UEM 


Y; =н. + о Х + By Xij + В Ханз + Eijk Reduced model (23.15) 


When this reduced model is fitted by regressing Y in column | of Table 23.2 on Xi, Xna d 
Хз in columns 2—4, the results presented in Table 23.3b are obtained. The generaj “ж 
test statistic (2.70) therefore is: a 
| SSE(R) — SSE(F) | SSE(F) 
КСЕ а; 

_ 1.3754 — 1.3000 . 1.3000 _ ::0377 3. 
Е 10 — 8 ^ B 4695 7 
To control the risk of making a Type І error ato; = .05, we require F (.95; 2, 8) = 4.46. Since 


F* = .23 < 4.46, we conclude Mp, that no interaction effects are present. The P-value for 
this test statistic is .80. 


* 


23 


Tests for Factor Main Effects. We now proceed to test whether or not factor A and 
factor B main effects are present. The ANOVA model alternatives: 


Ho: a; = о = 0 Ho: By = В = В. = 0 
(23.16) 
H,: not both о; equal zero Ha: not all В; equal zero 
become for regression model (23.1 1): 
Ho: a, =0 Ho: By = В = 0 
E ЕЕ (23.162) 


Н: a, #0 Ha: not both В; equal zero 


The reduced regression models for testing for factor А main effects and factor B main 
effects therefore are: 


Үк = и. + В.Х + В Хз + (е8) Ха Хо 
+ (o B)is Ха Хз + Eijk Reduced model (23.17) 


Y; = H- + Q Xij + (OB) Хук Хә 
+ (eB) Хк Хэ + Eijk Reduced model (23.18) 


Table 23.3c presents the results of fitting reduced model (23.17), where Y in column ! 
of Table 23.2 is regressed on X», X3, X, X», and X, Хз in columns 3—6. Finally, Table 23.3d 
contains the results of fitting reduced model (23.18), where Y in colugin | of Table 23.2 is 
regressed on X,, X, X», and X, Хз in columns 2, 5, and 6. The two test statistics therefore 


are: 
1.4200 — 1.300 .3000 .1200 
y= ы E = 0 = = 74 
9-8 8 .1625 
4897 — 1.3 : 2.094 
езт 897 — 1.3000 | 1.3000 — 0949 12.89 


10— 8 ” 8 1625 


+ Chapter 23  Two-Factor Studies with Unequal Sample Sizes 959 


.EE23-4 — source of Variation 


A Gender (A) 

&-Growth Bone developrnent (B) 
ле zAB'interactions 
iple. Error. 


For a = .05, we require F(.95; 1, 8) = 5.32 and F(.95; 2, 8) = 4.46 for the two tests. 
Since FY = .74 < 5.32 and F7 = 12.89 > 4.46, we conclude that there are no factor A 
main effects but that factor B main effects are present. The respective P-values for these 
two test statistics are .41 and-.003. 

Thus, these tests support the indications obtained previously from the estimated treatment 
means plot in Figure 23.1, that a child’s bone development affects the change in growth 
rate during growth hormone treatment and that there are no gender and interaction effects. 
The family level of significance for the set of three tests just conducted, according to the 
e Bonferroni inequality (4.4), is .15. 

Ў At this point, the next step in the analysis of the study results is to examine the nature of 
the bone development effects. We shall discuss this analysis in the next section. 

Table 23.4 contains a consolidated ANOVA table presenting the results from fitting the 
four regression models in Table 23.3. The sum of squares for a factor effect in each instance 
is the difference between the error sums of squares for the reduced and full models shown in 
Table 23.3, and the associated degrees of freedom are the difference between the respective 
degrees of freedom for these error sums of squares. Note that a total sum of squares is not 
shown in Table 23.4 because the sums of squares for the three factor effects and for error 
do not add to SSTO when the treatment sample sizes are unequal. 


Comment 

In the event that pooling of sums of squares is desired for testing factor main effects when the test 
for interactions leads to the conclusion that there are none, as discussed in Section 19.10, the full 
regression model for testing factor A and factor B main effects needs to be revised. Specifically with 
reference to the growth hormone example, the full regression model in (23.11) would need to be 
revised by excluding the interaction effects and would be as follows: 


Yi = џ.. + ол Хк + Bi Хк T P»Xijis T Eijk Revised full model (23.19) 
m 


23.3 Inferences about Factor Effects when Sample Sizes 
| Are Unequal 


The estimation of factor effects when the treatment sample sizes are unequal is completely 
apalogous to when the sample sizes are equal. The nature of the analysis depends on whether 
or not important interactions are present. When no important interactions are present, the 
analysis generally is concerned with the factor level means џ;. and p.j. On the other 


960 Part Five Multi-Factor Studies 


hand, when important interactions are present, the analysis usually focuses on the treatment 
means Hij» 

The estimators and estimated variances presented in Chapter 19 for equal sample sizes 
must, of course, be modified to recognize the unequal treatment sample sizes, Foy instance 
if interest is in estimating the factor level means и;. as defined in (19.2) when all treatment 
means are of equal importance: 


EDT 
i 


the appropriate estimator is simply the unweighted average of the estimated treatment 
means Y;;.: 


These estimated factor level means are referred to as least squares means. Since the Y, j ate 
independent, the variance of this estimator is: 


> І = 1 ADT. | 
cues cde lcu 
Jj j M j 


Р nij 


and the estimated variance is: 
р MSE | 
501.) = Eo 1 a 
ПШ 


Table 23.5 presents the formulas for the point estimator and estimated variance when 
estimating factor level means, pairwise comparisons of factor level means, and contrasts 
or linear combinations of factor level means, when the sample sizes are unequal. The 
corresponding formulas for treatment means, pairwise comparisons of treatment means, 
and contrasts or linear combinations of treatment means are also presented in this 

` table. 

All multiple comparison procedures applicable for equal sample sizes are appropriate 
when the treatment sample sizes are unequal. The Tukey pairwise comparison procedure 
then is conservative. The degrees of freedom associated with MSE are пт — ab, as before. 
[Recall for equal sample sizes that пт = nab; hence, nr — ab = (n — 1)ab.] Table 23.5 also 
presents the appropriate simultaneous comparison multiples for making inferences about 
factor level means or treatment means. 

Test statistics and decision rules for simultaneous tests based on the Bonferroni, Tukey, 
and Scheffé procedures are easily adapted from the formulas in Chapter 19. The form of 
a test statistic does not change, but the degrees of freedom associated with MSE in each 
decision rule must now be expressed as пт — ab. 

Since no new issues are involved in estimating factor effects when the sample sizes afe 
unequal, we proceed directly to two examples. 


Chapter 23  Two-Factor Studies with Unequal Sample Sizes 961 


: “BLE 23.5 Point Estimators and Estimated Variances for Two-Factor Analyses when Sample Sizes 


i Unequal. 


wer 


a 
А 
ih 
vat 
& 
i 
uu 
ЧЫЙ 


Ds Lf uA & Ines oe 


Batt -— a/2ginr. — ab) 


(9) Factor Level Mean — 


(23.20) 


(23.21) 


(23.22) 


(d) Confidence Interval. Multiple 
Siriglé Estimätťë 


t(1 —o/2; n; — ab) t(1.— o/2;nr — ab); 


Multiple Comparisons 
B= ic —af2g пт ~ ab). 


Yo E 03а, nr —-ab). Tes uuu es by nr= a5 .(23.23) 


(a.— DEG.—0;a— 1; n — ab). E 574 =(b- DEC =o; b.— 121. ab) 


(continued) 


962 Part Five  Multi-Factor Studies 


TABLE 23.5 
Point 
Estimators and 
Estimated 
Variances for 
Two-Factor 
Analyses when 
Sample Sizes 
Are Unequal 
(concluded). 


(e) Treatment Mean g 


Hij = 
hij = У; 
23.94 
su) = == КЕ 
if n 


(f) Pairwise Comparison of Treatment Means 


D = hij- ш 
di (23.25) 
oe 1 4 шт 
s?{B} = MSE ( — + 
fij np 


(g) Contrast or Linear Combination of Treatment Means 
25397 
t= X. 
2 
С. 
s?(£] = MSE att 
iti = mse у 


(h) Confidence Interval Multiple 


Single Estimate 
t(1 — 0/2; nr — ab) 
Multiple Comparisons 
B = t(1— a/2g; пт — ab) 


1 
T = —q(1 

72% 
52 = (ab— 1)F (1 — a; ab — 1, пт — ab) 


— а; ab, пт — ab) (23.27) 


Example 1—Pairwise Comparisons of Factor Level Means 


We continue with the growth hormone example. We found earlier that a child's gender and 
bone development do not interact in their effects on the change in the growth rate when 
growth hormone is administered. We further found no main gender (factor A) effects, but 
concluded that a child's bone development (factor B) does affect the change in growth rate. 
We shall now analyze the nature of the bone development effects by means of pairwise 
comparisons among the three bone development groups. The Tukey multiple comparison 
procedure will be used. This procedure is conservative when sample sizes are unequal, and 


Chapter 23  Two-Factor Studies with Unequal Sample Sizes 963 


use of the Bonferroni procedure would lead to wider confidence intervals here. The family 
confidence coefficient has been specified to be .90. 

We use formulas (23.21) in Table 23.5 for the point estimates and estimated variances. 
The estimated treatment means are given in Table 23.1, and MSE is found in Table 23.4. 
For the pairwise comparisons of the bone development factor level means (j = 1: severely 
depressed; ј = 2: moderately depressed; j = 3: mildly depressed), we obtain: 


Ya + Y. 20424 


Ha = 2 = 2 2.2 
Y2. + Yoo. 1.9+21 

йо = 12 > 2. _ = —20 

[E E CO Er 

BD, = й = йо = 2.2 — 2.0 = 2 

Do = fy = йз = 2.2 —.9 = 1.3 


A 


D; = ф э— з = 2.0 — .9 = 1.1 


Я 1625 [1 1 1 1 x 
2 I deg ОР" TA m = | = zz 
s {D} = оу (5 +5 + p 5) .0880 5{Ё}=.297 
" 1625 [1 1 1 1 А 
{Po} = —> (вв) =.088 Do} = 2 
540) (oy (5 2+] + 5) 0880 . s(D;) 97 
jia -1625 (1 1 1 DY TP 
s°{D3} = ОЎ \2 t S t 3 + 3] .0677 s(D3) = 260 


For a 90 percent family confidence coefficient, we require: 


1 1 
T= TAR 3, 8) = —= (3.37) = 2.38 


б J 


Hence, we obtain the following confidence intervals: 


—.51 = .2—2.38(.297) < ма — из < .2+2.38(.297) = .91 
59 = 1.3 — 2.38(.297) < wy — шз < 1.3 +2.38(.297) = 2.01 
48 = 1.1 — 2.38(.260) < u.2 — из < 1.1 + 2.38(.260) = 1.72 


We conclude from these confidence intervals with 90 percent family confidence 
coefficient that growth hormone deficient, short children with mildly depressed bone de- 
velopment on the average have a substantially smaller increase in the growth rate than 
children with either moderately depressed or severely depressed bone development. Fur- 
ther, the latter two groups of children do not show significantly different mean changes in 
the growth rate. We summarize these findings in the following line plot of the estimated 


964 Part Fiye Multi-Factor Studies 


factor level means: 


Mild Moderate Severe 
-——1- L - \ d == 
0.5 1.5 2.5 


Change in Growth Rate 


Example 2—Single-Degree-of-Freedom Test 
In the growth hormone example, a researcher wanted to know whether children with only 
mildly depressed bone development obtain, on the average, any increase in the growth Tate 
with administration of growth hormone. Thus, the alternatives to be considered are those 
fór a one-sided test: 
Ho: из < 0 
Н: u3 > 0 
The level of significance is to be controlled at о = .05. 
The test statistic to be employed is: 


= Ёз 
50.3) 
We found earlier that £j. = .9 and MSE = .1625. Using (23.20), we obtain: 
.1625 /1 1 
2,5 А 
3} = —- {= + = | = .033 5{Д.з} = .184 
5.3) оу? (2*5) 0339 ={4.3) 
Hence, the test statistic is: 
9 
t* = — = 4.89 
.184 


For œ = .05, we require £(.95; 8) = 1.860. Therefore the one-sided decision rule is: 
If t* < 1.860, conclude Ho 
If ¢* > 1.860, conclude H, 


Since t* = 4.89 > 1.860, we conclude H,, that the mean change in the growth rate for 
children with mildly depressed bone development is greater than zero. The one-sided 
P-value for this test statistic is .0006. 


29.4 Empty Cells in Two-Factor Studies 


Occasionally after a two-factor study has been completed, it turns out that there are no cases 
in one or several treatment cells. Not only are the treatment sample sizes unequal then, but 
there is no sample information about the treatment means for the empty cells. Consider 
again Table 23.1 for the growth hormone study. Note that two female children with severely 
depressed bone condition dropped out of the study before its completion so that only one 


Chapter 23  Two-Factor Studies with Unequal Sample Sizes 965 


case (n2; = 1) is present for that treatment. We can imagine easily that all three of these 
children could have dropped out of the study. Then we would have had из; = 0, and no 
sample information would have been available about the treatment mean u21. 


Ha Analysis of Factor Effects 


Example 


FIGURE 23.2 
Schematic 
Representation 
of Growth 
Hormone 
Study with 
Empty 

Cell —Growth 
Hormone 
Example 

(na = 0). 


When one or several treatment cells are empty, the analysis of variance for unequal sample 
sizes by means of the equivalent regression model, as explained earlier, cannot be con- 
ducted. This does not mean, however, that the entire two-factor study has become useless. 
A variety of partial analyses usually can be conducted that wil] provide at least some 
information about the nature of the factor effects. The analyses that can be undertaken de- 
pend on the particular cells for which no sample information is available. We illustrate by 
means of an example how partial information can be obtained from two-factor studies with 
empty cells. 


In the growth hormone example, suppose that there are no observations for female children 
with severely depressed bone development; i.e., n2, = 0. In that case no sample information 
is available about the treatment mean иу. We represent this situation in Figure 23.2a. 
Partial information about interactions can still be obtained by restricting attention to 
children with moderately depressed and mildly depressed bone development, as represented 
in Figure 23.2b. For these children, interactions are present if the differences between the 


Bone Development 


Severely Moderately Mildly 
Depressed Depressed Depressed 
Gender By B2 B3 


(a) Empty Cell 


Male (Ат) bau H2 илз 

Female (A2) Empty cell H22 H23 
(b) Partial Study of Interactions 

Male (Аз) H2 илз 

Female (A2) H22 U23 


(c) Partial Study of Factor A and Factor B Main Effects 


Male (Ат) H12 илз 
Female (Аз) H22 H23 


(d) Partial Study of Factor B Main Effects 


Male (Ат) un Ил? лз 
Female (Аз) 


966 PartFive  Multi-Factor Studies 


treatment means for the two genders are not the same for the two bone development gro 
EN ui 
The two differences are: ps. 


Hi2— Hn Hi3 = 23 


Thus, we consider the following contrast among the treatment means: 


L = цр — иэ ~ unc un 


We can either estimate L by means of a confidence interval and note whether or not the 
interval includes zero, or we can conduct a single degree of freedom test to establish whether 
or not interactions are present. With either approach, we use MSE based on all sample 
observations so that the associated degrees of freedom for MSE would be пт — (ab — 1) = 
13 — 5 = 8 (remember that nz, = О now). 

If the partial analysis of interactions were to suggest that no interactions are Present, the 
effect of gender can be studied by comparing the factor level means excluding children with 
severely depressed bone development, as represented in Figure 23.2c: 

_ а? + Шз | H2 + uz 

xix cM E 
In addition, the effect of bone development can be studied for male children by comparing 
the treatment means 411, 412, and 413, as represented in Figure 23.2d, or it can be studied 
for children of both genders by excluding those with severely depressed bone development, 
as represented in Figure 23.2c: 


| Hn tun _ Каз + Hun 
2 2 


Analysis if Model with No Interactions Can Be Employed 


Occasionally, information is available from previous studies that the two factors in a two- 
factor study do not interact. In that case, a model with no interaction effects can be employed. 
Such a model was introduced in (20.1) for the case п = 1. When there are п, ; observations 
for the treatment consisting of the ith level of factor A and the jth level of factor B, the 
no-interaction model is: 


и H-3 


Үк = H- +0; + Bj + Eijk No-interaction model (23.28) 


v When no-interaction model (23.28) is appropriate, the analysis of variance and the analysis 
of factor main effects can be conducted by means of the equivalent regression model even 
when one or several cells are empty, as long as relevant other cells are not empty. [The 
relevant other cells are ones that satisfy the relations in (19.7b).] 

The reason why the usual analysis of variance by means of the equivalent regression 
model can be conducted for ANOVA model (23.28), even though one or more cells are 
empty, is that the assumption of no interactions permits us in effect to estimate the empty 
cell means. Conceptually, estimation of an empty cell mean, say 421, requires two steps. 
First, we need to estimate the treatment means for the nonempty cells. These estimates 
are more complicated than simply using the estimated treatment means because the model 
assumption of no interactions needs to be utilized. We encountered such estimates for a 
no-interaction model in Chapter 20 when we considered studies where n = 1 for each cell. 
Once we have estimates of the treatment means и; ; for the nonempty cells, the second step 


Chapter 23  Two-Factor Studies with Unequal Sample Sizes 967 


estimating the empty cell mean шә is to utilize the relation in (19.7b) for the no-interaction 
case, whereby we can express u2, in terms of three other treatment means. For instance, 
E u2 сап be estimated from Дэу = fin + fi — fua. 


In the growth hormone example, suppose that the cell for female children with severely 
depressed bone development is empty. From past knowledge, the researcher is able to 
assume that there are no interactions between gender and bone development. In that case, 
regression model (23.3) reduces to: 


Үк = ш. Toa Х ж + В.Хгз. + B2Xijk3 + Eijk Full model (23.29) 


To test for, say, gender main effects, we first fit this full model and obtain SSE(F). The 
alternatives to be tested are: 


Ho: Q = 0 
Ha: a, #0 
Hence, the reduced model is: 
Yijk =u. + Bi Xij + B2Xijk3 + éijk Reduced model (23.30) 


We then fit this reduced model, obtain SSE(R), and calculate the general linear test statistic 
(2.70) in the usual fashion. A test for bone development effects is carried out similarly. 


Comments 

1. We need to caution that it is not appropriate in the presence of empty cells to use a no-interaction 
model as the full model when no prior information about the absence of interactions 1s available. Only 
partial analyses of factor effects should then be undertaken, as explained earlier. 

2. We have considered one cause of empty cells, when cases are missing or lost at random in 
an experimental study or when the sample in an observational study fails to include any cases for 
à particular cell. In these situations, the cell mean for the empty cell exists even though no cases 
are available for that cell. In contrast, a structural empty cell occurs when it is known a priori that 
it is impossible to obtain data for that cell. In this latter situation, the factorial structure is partially 
destroyed since the cell mean for the empty cell does not exist, and it 1s therefore meaningless to 
estimate the mean for such a structural empty cell on the basis of the other cases. ш 


Missing Observations in Randomized Complete Block Designs 

There are occasions when one or several observations in a randomized complete block 
design are "missing"—a subject may have been sick, a record may have been mislaid, а 
treatment may have been applied incorrectly in one instance. Such missing responses destroy 
the balance (orthogonality) of the complete block design and make the usual ANOVA 
calculations inappropriate. However, the regression approach discussed in Section 23.2, is 
ordinarily still appropriate when there are missing responses. 

Since no new principles are involved, we turn to an example to illustrate the use of 
the regression approach when observations are missing in a randomized block design 
experiment. 


968 PartFive Multi-Factor Studies 


Example 


TABLE 23.6 
Example of 
Missing 
Observation in 
Randomized 
Block Design 
(к = 3, m = 3). 


Table 23.6a contains the data for a simple randomized block design experiment with к — 3 
treatments and n; = 3 blocks, where observation Y;; is missing. We set up the regressio 
model equivalent to randomized block design model (21.1) as follows: a 


Yip = ш. + pi Х + ea Xij2 + Tı Xijs + TX i ja +её;; Full model (23.31) 
EUR Let EL ы НОЦ 


Block effect "Treatment effect 


where: 
if experimental unit from block 1 


if experiment unit from block 3 
otherwise 


| 


if experimental unit from block 2 
if experiment unit from block 3 
otherwise 


if experimental unit received treatment 1 
if experiment unit received treatment 3 
otherwise 


2 
[ 


онн OFM KF OR KF OK Be 


x 
| 

MM Н М 
| 


if experimental unit received treatment 2 
if experiment unit received treatment 3 
otherwise 


Table 23.6b repeats the Y observations in column 1 and presents the four indicator variable 
in columns 2—5. 


(à) Response Data MN ; 


Block Pi score : 
i 1 2 3 

1 Missing 10 9 
2 11 10 7. 
3 6 4 3 


(b). Regression Variables | 


@ о © @ & 


i ] Xx X2 X3 Xa Р 
1 2 10 1 0 0. 1 
1 3 9 1 0 -1 —1 
2 1 11 0 1 1 0 
2 2 10 0 1 0 1 
2 3 7 0 1 =], Zi 
3 1 6 ~ -1 T 0 
3 2 4 -1 —1 0 1 
3 3 3 1 1 cf 1 


TABLE 23.7 
ANOVA Table 
and Other 
Regression 
Output— 
Missing Data 
Example of 
Table 23.6. 


Chapter 23 Two-Factor Studies with Unequal Sample Sizes 969 


The analysis of variance for testing treatment effects and block effects is carried out in 
the usual manner by first fitting the full model (23.31) and then fitting each of the following 
reduced models: 


Test for Block Effects 

Yij = ш. + t Xijs + vo Xija + еу Reduced model (23.32) 
Test for Treatment Effects 

Ү = ш. + pr Xijr + eoXijo + &ij Reduced model (23.33) 


The extra sums of squares SSR(X;, X2|X3, X4) for blocks and SSR(X3, X4|X1, X5) for 
treatments are then calculated in the usual manner. Table 23.7a presents these extra sums 
of squares for our example obtained from fitting the full and reduced models, as well as the 
error sum of squares for the full model. No total sum of squares is shown because of lack 
of orthogonality as a result of the missing observation. 


(а) ANOVA Table 
Source of | 
Variation SS ;- df MS 
Blocks ‘53.83 2 26.92 
Treatments 12:50 2 6.25 
Error 133 3 44 


(b) Estimated Regression. Coefficients for Full Model (23.31) 


Regression — Estimated Regression 


Coefficient Coefficient 
[ns Ё... = 8.000 
pi й = 2.333 
p2 бә = 1.333 
т $4 = 1.667 
12 12 = 0.0 


(c) Estimated Variance-Covariance Matrix of Regression Coefficients 
Ё. bi [^ a iz 
Me ‚06173 
ĝi .02469 14815 
pz |—.01235 —.07407 4111F 
ĉ | 02469  .04938 —.02469 14815 


T2 —01235 —.02469 .01235 —.07407 111111 . 


970 PartFive  Multi-Factor Studies 


The test for treatment effects is conducted as usual. From Table 23.7a we find: 


MSR(X3, Xa|Xi, X2) 6.25 
ii MSE ~ 44 


For a = .05, we need F(.95; 2, 3) = 9.55. Since F* = 14.2 > 9.55, we conclude that 
differential treatment effects are present. The P-value of this test is .03. The test for block 
effects can be carried out along similar lines when it is of interest. 

No new problems are encountered with the regression approach in analyzing fixed treat- 
ment effects when there are missing observations. For instance, to estimate the pairwise 
comparison L = д. — 3 = tj — T3, we utilize the fact that тз = —t, — тә sọ that we 
have: 


uk 


= 14.2 


= ра = ид = = в = т (тр т) = 27, + т (23.34) 
An unbiased estimator of (23.34) is: 
E=27,+% (23.35) 
whose estimated variance is, using (A.30b): 
ЧЁ = 4 {э} + (6) + 4s {êi 8) (23.36) 


Table 23.7b contains the estimated regression coefficients for the full model, and Table 23.7c 
contains the estimated variance-covariance matrix of the regression coefficients. We there- 
fore obtain the following estimates: 


È —2(1.667) + 0.0 = 3.334 
s?(£) 2 4.14815) + .11 ILI + 4(—.07407) = .4074 


so that the estimated standard deviation is s{L} = .638. A 95 percent confidence interval 
for L requires f (.975; 3) = 3.182, yielding the confidence limits 3.334 + 3.182(.638) and 
the confidence interval: 


1.3 < ya = из < 5.4 


23.5 ANOVA Inferences when ‘Treatment Means Are 


of Unequal Importance 


Example 


On occasion, the treatment means и; in a two-factor study are not Qf equal importance, so 
the unweighted factor level means џи. ; and и. defined in (19.1) and (19.2) are not relevant. 


In a breakfast cereal study 60 percent of the consumers of this product were children, 
20 percent male adults, and 20 percent female adults. In this study, factor A was type of 
sweetener (i = 1: corn syrup, i = 2: low-calorie sweetener) and factor В was consumer 
category (j = 1: child, j = 2: male adult, j = 3: female adult). The company wishes to 
determine if a change to a low-calorie sweetener will change the mean rating of its product 
in the population of consumers. Here, the treatment means 44; have unequal importance 


Chapter 23  7wo-Factor Studies with Unequal Sample Sizes 971 


and the company therefore wishes to compare the two weighted means: 


Corn syrup: .бип + 212 + 2013 
Low-calorie sweetener: 612, + 2125 + .2123 


This can be done by estimating the contrast: 
L = (бип + 2up + -2u13) — C652 + 2022 + 2123) 
or by testing the alternatives: 
Hy: L=0 
Ha: L #0 


Note the use of the weights .6, .2, and .2 to reflect the unequal importance of the treatment 
means uj. 


Estimation of Treatment Means and Factor Effects 


Example 


Estimation of treatment means and factor effects when the treatment means have unequal 
importance does not lead to any additional complexities. The general formulas in Sec- 
tion 23.3 for estimating treatment means 1; and for contrasts of treatment means still 
apply. We illustrate the analysis of factor effects when the treatment means are of unequal 
importance by returning to the mathematics learning example in Table 19.11. 


A school administrator in the mathematics learning example had requested information 
about which teaching method leads to better learning of college mathematics when 20 per- 
cent of the students in the class have excellent quantitative ability, 50 percent have good 
ability, and 30 percent have moderate ability. The mean learning scores for such a class 
mix with the two teaching methods are the following linear combinations of the treatment 
means: 


Abstract method: Ly = 2p + 5p + 3Ui 
Standard method: Lz = .2u21 + .Sp22 + .3 U23 


We assume here that the mean learning scores for students with different quantitative abilities 
will not be affected by aclass mix that is somewhat different from the one in the experimental 
study. 

Point estimates of the mean scores are (data in Table 19.11a): 


Ê, = .2(92) + .5(81) + .3(73) = 80.8 
Êa = .2(90) + .5(86) + .3(82) = 85.6 
The difference between the two mean scores is a contrast: 
1 = 1, = 1, 
This contrast is estimated to be: 
Ê = Ê, — 1, = 80.8 — 85.6 = —4.8 


We can obtain the estimated variance of Ê by (19.93b) since there are equal sample sizes 


972 PartFive Multi-Factor Studies 


here: 


2,2 28 5 ? ә 2 2 2 
s{L}= 51162) + C5) C3) + (—2) + (—.5) + (—.3)"] = 1.013 


so that the estimated standard deviation is s(£.] = 1.006. For a 95 percent Confidence 
coefficient, we require /(.975; 120) = 1.980. Hence, the confidence limits are —48 4 
1.980(1.006) and the desired confidence interval is: 


—6.79 x L < —2.81 


With 95 percent confidence we conclude that the standard teaching method is better for the 
specified class mix, leading to a mean learning score that is at least 2.81 points greater than 
that for the abstract teaching method and may be as much as 6.79 points greater. 


Test for Interactions 


The test for interactions also is not affected by unequal importance of treatment means since 
this test is concerned with the parallelism, or lack of it, of the treatment mean curves, This 
was illustrated in Figures 19.3, 19.4, and 19.5. The treatment mean curves are based Solely 
on the individual treatment means и; and hence do not involve averages of the treatment 
means. Thus, the test for interactions is conducted as explained in Section 19.6 when the 
sample sizes are equal and as explained in Section 23.2 when the sample sizes are unequal, 
whether the treatment means are of equal or unequal importance. 


Tests for Factor Main Effects by Use of Equivalent Regression Models 


Example 


Tests for factor main effects when the treatment means are of unequal importance are carried 
out by the general linear test approach of Chapter 2. First, we shall explain how to implement 
factor tests with the general linear test approach by use of equivalent regression models; we 
then shall explain implementation by means of a matrix formulation. 

When the treatment means are of unequal importance, the use of equivalent regression 
models to carry out the general linear test approach is easiest when cell means model (19.15) 
is employed. Since no new principles with the regression approach are involved. we turn to 
an example to illustrate the tests for main effects. 


In the growth hormone example in Table 23.1, it is known that twice as many male as female 
children undergo growth hormone treatment therapy, and that this ratio is the same for 
children who have severe, moderate, and mild depression in bone development. Inferences 
are desired about the target population of children undergoing therapy. Specifically, we wish 
to test whether or not the state of bone development affects the change in growth rate in the 
target population. The alternatives therefore are: | 


А 2р1 + Ha = 2u + uz 2u + рэз * 


3 = 3 3 (23.37) 
Н,: not all equalities hold 


Ho 


We restate the alternative Ho in the following equivalent fashion: 


2uu ctun 22+ um 


—0 
fis ё 3 (23.37a) 
2u + un 2713 + ua 0 
3 3 Б 


Chapter 23  Two-Factor Studies with Unequal Sample Sizes 973 


Implementation of the general linear test (2.70) requires that we fit the full model and 
then fit the reduced model under Но. The full ANOVA model is cell means model (19.15): 


Yi = Hij + Eijk 
Following the example in (16.85), we obtain the equivalent full regression model: 


Yijk = шп Ха + ш2 Хо + Hi3 Xij + Aa Хаја 
+ H2 Xijks + H23Xijke t &ijk Full model (23.38) 


| if case from level 1 of factor А and level 1 of factor В 
1 a 


О otherwise 


1 if. case from level 1 of factor A and level 2 of factor В 
O otherwise 


x 1 if case from level 2 of factor A and level 3 of factor В 
m O otherwise 


Table 23.8 repeats in column 1 a portion of the data on the Y observations from Table 23.1 
and presents in columns 2-7 the codings of the X variables for the full model. Note, for 
instance, that the codings of the X variables for observation Үү are X, = 1, X2 = Хз = 
X4 =X 5— Xg = 0. 

When У in column 1 of Table 23.8 is regressed on the X variables in columns 2-7 for 
a no-intercept regression model, we obtain SSE(F) = 1.3000, associated with df, = 
14 — 6=8 degrees of freedom. These results, of course, are the same as in Table 23.3a 
when the equivalent regression model in the factor effects form was used. 

To obtain the reduced regression model under Но, we need to incorporate the conditions 
in (23.372) into the full model. We shall do this by solving the system of two equations in 


TABLE 23.8 Data for Regression Fits when Treatment Means of Unequal Importance— 
Growth Hormone Example. 


Baa, 


(0 о ә о б © 0 ә ( (00 0) 


Full Model | Reduced Model 
Y X X2 X3 Ха | Xi Хв ZA Z2 Z3 24 
14 1 0 0 0.0 0 1 о 0 0 
24 1 0 0 о о 0 1 0 0 0 
17 0 1 0 о о о 0 1 0 0 
7 0. 0 1 0 O0 0 0 0 1 0 
9 0 0 о до 0 “l Q 2 2 1 
13 о 0 0 0 O0 1 0 2 -2 1 


974 PartFive Multi-Factor Studies 


(23.372) for any two of the parameters and replacing these two parameters in the ful] 


: > IE . : Mo 
by the resulting expressions. Arbitrarily choosing рә and 423, we find in Solving the а 
equations in (23.37a): * 
bar = 20i + umo —2uu 
Hz = 24412 — 2u um (23.39) 


Replacing из and изу in full model (23.38) by the expressions in (23.39), we obtain the 
reduced model: 
Yi, = Mu Ха + pio Хо + Max Хз + Quis + u2 — 2н1)Х ы 
+ uz) X izes + Quia — 2з + M22) Xijne + Eijk 
This model can be simplified algebraically, as follows: 
Үн = цп и + 2 ко + аз 2з + Шо рка + Eijk Reduced model 
(23.40) 
where: 
Аы = Хук — 2Xijia 
Zijk2 = Xija + 2Xija + 2Х 6 
Жиз = Хз = 2Х в 
Zijka = Хук + Xijas + Xijno 
Table 23.8 shows the codings of the new Z variables in columns 8—11. For instance, the 
codings for the new Z variables associated with Y;;, are obtained as follows: 
X,=1 Х = 0 Хз = 0 X4—0 Xs=0 Хв=0 
Z2 = 0+2(0)+2(0)=0 
Z, = 0- 2(0) = 0 
Z, = 0+0+0=0 
When Y in column 1 of Table 23.8 is regressed on the Z variables in columns 8—11 witha 


no-intercept regression model, we obtain SSE(R) = 4.754 and df, = 14 — 4 = 10. Hence, 
the general linear test statistic (2.70) is: 


__ SSE(R) — SSE(F) | SSE(F) 


p : . 
dfr — dfe df 
_ 4754 — 1.3000. 1.3000 _ (з 
10-8 8 


If Hy holds, F* follows the F distribution with 2 and 8 degrees of freedom. To control the 
level of significance at œ = .05, we require F(.95; 2, 8) = 4.46. Since F* = 10.63 > 4.46, 
we conclude H,, that the weighted mean change in the growth rate is not the same for the 
three bone development groups. The P-value of this test is .006. 


Chapter 23  Two-Factor Studies with Unequal Sample Sizes 975 


ests for Factor Main Effects by Use of Matrix Formulation 


We saw in the growth hormone example when using the equivalent regression models 
to implement the general linear test approach that it was necessary to solve a system of 
two equations in six unknown parameters in terms of any two of the parameters. As the 
number of equations in Но increases, the algebra can become quite tedious. Under these 
circumstances, it may be easier to carry out the F test when the treatment means are of 
unequal importance by means of formulating the general linear test in matrix terms. 

The full model, as before in (23.38), is represented by: 


Y=X Pte (23.41) 


axl пхррхі nxi 
For the growth hormone example, the X matrix isa 14 x 6 matrix consisting of the columns 
for X;—X¢ in Table 23.8, and the В vector is: 


Mi 
H12 
B = Hi3 


H23 


The least squares and maximum likelihood estimators of the parameters in the full normal 
error model (23.41) will now be denoted by bp and are, as before, given by (6.25): 


br = (XX) 'X’Y (23.42) 
Also, the error sum of squares is given by (6.35): 
SSE(F) = (Y — Xbp)'(Y — Xbr) = ҮҮ — ЬЕХ'Ү (23.43) 
A linear test hypothesis Ho is represented in matrix form as follows: 
Hy: C Bp =h (23.44) 
sxppxi sxi 


where С is a specified s x p matrix of rank s and h is a specified s x 1 vector. For the 
growth hormone example, the hypothesis Но in (23.372) can be stated in the form (23.44) 
with the following matrices: 


2 


E oe Дд. eme ig 
c= 3 3 3 
2x6 2 0 _2 1 0 _1 
3 3 3 3 
Hii 
My 
В = LB - 1 
6x1 H21 2х1 0 
H22 


976 PartFive Multi-Factor Studies 


Note that this formulation yields (23.372): 


2 2 1 I 
сс 3^ 32 + 3"? l 732 2 Е = 
2х1 2 2 I I 0 
3^" zd зз + 3h p 3/2 
The reduced model then is: 
Y=Xß+e where СВ =h (23.45) 


Itcan be shown that the least squares and maximum likelihood estimators under the reduced 
model, to be denoted by ba, are: 


b, = bp — (XX) C'(C(X'X) !'C)^ (Cb, — hb) (23.46) 
and the error sum of squares is: 
SSE(R) = (Y — Xb&)'(Y — Xbg) (23.47) 


which has associated with it df, = n — (p — s) degrees of freedom. It can be shown also 
that the difference SSE(R) — SSE(F) can be expressed as follows: 


SSE(R) — SSE(F) = (Cbr — h(C(XX) C) (Cb, — b) (23.48) 


which has associated with it dfg — df, = (n — p + s) — (n — p) = s degrees of freedom. 
Hence, the general linear test statistic (2.70) here is: 


SSE(R) —SSE(F) | SSE(F) 

5 `оп—р 
where SSE(R) — SSE(F) is given by (23.48) and SSE(F) is given by (23.43). Note for 
the erowth hormone example that the numerator degrees of freedom are s — 2 and the 


denominator degrees of freedom are n — p = 14 — 6 = 8, which agree with the degrees of 
freedom obtained when using the equivalent regression models. 


Е* = (23.49) 


Comments 

1. Many of the major statistical packages require only that the user furnish Ho in the matrix form 
(23.44) and will then conduct the general linear test. 

2. The least squares estimators bg in (23.46) under the reduced model can be derived by mini- 
mizing the least squares criterion О = (Y — ХВ) (У — Xf) subject to the constraint €f —һ = 0, 
using Lagrange multipliers. 

3. The test for the alternatives (23.372) in the growth hormone example can also be conducted by 
estimating the two contrasts: 


2+ Из 2402 4» L 24s + Из 2] + pos 
кт т=з 3 Е 3 3 

with a multiple comparison procedure (e.g., the Bonferroni procedure) and noting whether or not both 

confidence intervals include zero. 


Chapter 23 Two-Factor Studies with Unequal Sample Sizes 977 


we 


‘ests for Factor Effects when Weights Are Proportional to Sample Sizes 
Simplifications in determining the term SSE(R) — SSE(F) in the general linear test statistic 
for testing weighted factor A and factor B effects occur when the weights n;; for the means 
Шу are proportional to the total sample sizes т. and и. ; for factor A and factor B levels, 
respectively. Such weights are appropriate in some circumstances but not in many others. 

Considera study of retail stores. The effects on shoplifting losses of size of store (factor A) 
and location of store within the city (factor В) are to be studied. Inferences about all retail 
stores in the population of interest are to be made. A random sample of пт retail stores 
is selected from the population of all stores, and the selected stores are then classified 
by size and location. We denote the resulting cell sample sizes as usual by тп. If the 
proportions of stores in the different size-location groups in the population were known, 
these known proportions would serve as the appropriate weights in making inferences about 
size and location main effects, and the general linear test procedures just discussed would 
be employed. Often, however, these proportions are not known. Under these conditions, the 
cell sample sizes n;; may be used to estimate the unknown proportions and therefore may 
serve as reasonable weights. 

To illustrate this, suppose that a — 2 store sizes and b — 3 locations are employed in the 
study of retail stores, and that a random sample of пт = 60 stores resulted in the following 
cell sample sizes пу: 


Location (j) 


Store Size CES = 

i j=1 j=2 j=3 Total 
i=1 20 5 4 29 
i=2 10 15 6 31 
Total 30 20 10 60 


Thus ny, = 20, n2, = 10, and so on. Further, denoting by n;. and и. ; the total factor A 
and factor B level sample sizes as defined in (23.1a) and (23.1b), respectively, we have 
ny. = 29, n., = 30, and so on. 

The test for comparing factor A effects, when the weights n;;/n;. reflect the importance 
of the factor A means, would then involve a comparison of the weighted mean for factor A 
level i = 1: 

201 + 59512 + 43 
29 


and the weighted mean for factor A level i = 2: 


1052; + 15422 + 6573 
31 
Expressed in symbolic notation, the alternatives would be: 


n n ns 
Ho: (a t) Hi + (же г) Hi2 + ( =з) Шз = (=) Mar + (2) Ma + (m H23 
ny. ny. ny n2. na. пэ. 


На: equality does not hold 


978 PartFive Multi-Factor Studies 


Similarly, the alternatives for testing weighted factor B effects would be 
weights n,;/n.; reflect the importance of the factor B means: 


Ail H2| nip 7122 Е nj 
Не: | — junu + | — Jun = | — Jun + | — Juz = | — Je + | -2 
n.| n. n.2 n.) n.3 na My 


H,: not all equalities hold 


as follows when 


We must caution that sample sizes often do not reflect appropriate importance. Sample 
sizes may have been chosen arbitrarily or they may reflect unequal attrition losses ina 
study. Sample sizes may also reflect cost considerations; for instance, larger sample sizes 
may be used by a market researcher for children than for adults because selection costs are 
lower. In all of these instances, use of weights based on sample sizes may lead to misleading 
inferences. 

When sample sizes do constitute appropriate weights, the alternatives for testing for 
weighted factor A effects can be stated in general as follows: 


nij Һај 
Ho: I ‚„ к=... = "uius "m 
0 : (2) uu 2 (22) а 


Н: not all equalities hold 


(23.50) 


and the alternatives for testing for weighted factor B effects are: 


hil Mib 
Ho: = = СХ А 
"2 E: ) Pa & o (23.51) 


1 


H: not all equalities hold 


It can be shown that the term SSE(R) — SSE(F) for testing weighted factor A effects 
involving the alternatives in (23.50) simplifies to the ordinary single-factor treatment sum 
of squares in (16.28), with the factor A levels considered to be the treatments: 


SSA = M ^n. (Y... - Y. (23.52) 
where: 
yc 423.522) 
Hi. 
Vin (23.52b) 
пт 
b "ij 
Y. =X Yu (23.520) 
foal k=l 
a bon, 


ү. = 5 »» y Y (23.520) 


pm 


Example — 


Chapter 23  Two-Factor Studies with Unequal Sample Sizes 979 


Similarly, the term SSE(R) — SSE(F) for testing weighted factor B effects involving the 
alternatives in (23.44) simplifies to the single-factor treatment sum of squares in (16.28), 
with the factor B levels considered to be the treatments: 


SSB = V ^n. (Ү.. – Y.? (23.53) 
j 
where: 
= Y.;. 
y. = —£ (23.53a) 
n.j 


у=) у Yi (23.53b) 
i=l К=1 
In the growth hormone example of Table 23.1, suppose that the treatment sample sizes 
т reflect the relative importance of the factor means. We saw in Section 23.2 that gender 
(factor A) and bone development (factor B) do not interact. We now wish to test whether 
gender affects the weighted mean change in the growth rate. The alternatives (23.50) here 
are: 
| ‚3 2 2 _1 3 3 
Ho: Sen + Шз + Шз = zua + 5n + zum 
H,: equality does not hold 


То calculate SSA in (23.52), we require from Table 23.1: 
Y,.. = 11.6 ny. = 7 ү... 1.65714 
Y>.. = 11.4 п. = 7 Yz.. = 1.62857 
Ү.. = 23.0 nr = 14 Ү.. = 1.64286 


Il 


We then obtain: 
SSA = 7(1.65714 — 1.64286)? + 7(1.62857 — 1.64286)” = .002857 


The number of degrees of freedom associated with SSA isa — 1 = 2 — 1 = 1. 

We found earlier in Table 23.3a that the error sum of squares for the full model is 
SSE(F) — 1.3000, with 8 degrees of freedom associated with it. Hence, the general linear 
test statistic here is: 


SSE(R)— SSE(F) | SSE(F) SSA 
pees ыас л д ре + MSE(F) 
dfr — dfr dfr l 

| 1. 
E QUE | SO «is 


For a = .05, we require F(.95; 1, 8) = 5.32. Since F* = .018 < 5.32, we conclude Мо, 
that the weighted mean change in the growth rate is the same for male and female children. 
The P-value of the test is .897. 

The test for factor B effects would be carried out in similar fashion. 


980 PartFive Multi-Factor Studies 


Comments 


І. A special case of weights proportional to the sample sizes occurs in designed experiments when 
the sample sizes themselves follow a proportional pattern. Suppose that a chain of diet establishments 
is experimenting with two diets that are of equal importance. The establishments cater to three times 
as many women as men. One hundred men and 300 women are selected, and half of each group is 
randomly assigned to each diet. Hence, the treatment sample sizes are as follows: 


Diet Men Women Total 
1 50 150 200 
2 50 150 200 

Total 100 300 400 


Note that these treatment sample sizes follow the relation: 


n, n.; 


nj; = (23.54) 


пт 


Condition (23.54) implies that the sample sizes in any two rows (or columns) are proportional, This 
is called a case of proportional frequencies. Here the test of diet effects reduces to the comparison 
of (ии + 3513)/4 versus (ио + 3422)/4 and the test of gender effects reduces to the comparison of 
(ил 21) /2 versus (uu 3-423) /2. It can be shown that the terms SS E(R) — SSE (F) for testing these 
factor A effects (diet) and factor B effects (gender) are given by (23.52) and (23.53), respectively, It 
can also be shown that the interaction sum of squares here is given by a simple formula: 


SSAB — УУ = Ys == y + Y..)? (23.55) 
i j 


Furthermore. the sums of squares in this special case are orthogonal so that SSA, SSB, SSAB, and SSE 
sum to SSTO. 

2. When proportional sample sizes are employed but the sample sizes do not reflect the importance 
of the factor level means (e.g., when the sample sizes are unequal but the factor level means are of 
equal importance), the regression approach or the general linear test approach explained earlier must 

: be employed. 

3. Thecell sample sizes in alternatives (23.50) and (23.51) are considered to be fixed, not random 
variables. Thus. the relevance of the alternatives depends on the reasonableness of the actual cell 
sample sizes as indicators of the importance of the treatment means. a 


23.6 Statistical Computing Packages 


Extreme care must be exercised when using packaged analysis of variance programs with 
unequal sample sizes because the default option of the package may not necessarily assign 
proper importance to each treatment mean. 'The user should read the package documentation 
carefully and make sure that the package generates the appropriate sums of squares for the 
tests of interest. 


Problems 


Chapter 23  Two-Factor Studies with Unequal Sample Sizes 981 


For the JMP, MINITAB, SAS, SPSS, and SYSTAT statistical packages, the outputs that 
are the equivalents of the regression results obtained in Sections 23.1—23.3 for the case of 
treatment means with equal importance and no empty cells are obtained as follows at the 
time of this writing: 


JMP—Fit Model 

MINITAB—GLM 

SAS PROC GLM — Type Ш or Туре IV sums of squares 
SPSS GLM—UNIANOVA/SSTYPE(3) 

SYSTAT— Default option 


Extreme caution should also be used with ANOVA computer packages that provide 
results when some treatment cells are empty. The package may make assumptions about 
interactions that the researcher is unwilling to make. In the absence of a clear description of 
how the package handles empty cells, it is preferable that appropriate analyses be conducted 
by the user specifying the appropriate contrasts of interest. 

When weights assigned to the treatment means are proportional to the sample sizes, 
numerator sums of squares SSA and SSB given in (23.52) and (23.53) may be obtained 
using JMP Sequential (Type 1) Tests option, MINITAB Sequential SS option, SAS PROC 
GLM—Type I sum of squares, SPSS GLM—UNIANOVA/SSTYPE(1), and SYSTAT— 


. Option Weighted Means Model. When a sequential Type I sum of squares is used to obtain 


SSA and SSB given in (23.52) and (23.53), two separate computing runs are needed, where 
in one run factor А is brought in first and in the second run factor B is brought in first. 

A simple option in using computer packages when the cell sample sizes are unequal, 
cell means have unequal importance, and/or some cells are empty is to use a single-factor 
ANOVA package that permits specification of contrasts to be estimated. The user can then 
specify the various contrasts of interest. 


23.1. A market research intern selected a random sample of 400 communities and classified them 
according to population size (four levels) and geographic region (five levels) to study the effects 
of these factors on sales of the company's products. When the intern found that the treatment 
sample sizes were unequal, the smallest cell frequency being four, the intern generated random 
numbers to reduce the number of communities in each cell to four and then proceeded to 
analyze the effects of population size and region on the basis of the 80 communities remaining. 
à. Does the method of randomly discarding cases lead to any biases? Explain. 

b. Was it wise for the intern to discard 320 cases randomly in order to obtain equal treatment 
sample sizes? 


23.2. A student asked: "If two-factor studies with unequal sample sizes must be analyzed by а 
regression approach, why bother with the two-factor analysis of variance model at all?” 
Comment. 


23.3. Refer to Eye contact effect Problems 19.12 and 19.13. 
a. Modify regression model (23:11) to apply to this two-factor study with a — 2 and b — 2. 
b. Setup the Y, X, and B matrices for the regression model in part (a). 
c. Obtain Xf. Verify the correctness of the expected values. 


982 PartFive Multi-Factor Studies 


d. 


Obtain the fitted regression function. What is estimated by the imercept term? 
Obtain the regression analysis оГ variance table based on appropriate extra sums of Square 

Е : : S. 
Do your results agree with those obtained using the ANOVA approach in 19,13b? 


. Test separately for interaction elfects, factor A main effects, and factor В main effects 


Use о = .01 for each test and state the alternatives, decision rule, and conclusion, 


*23.4. Refer to Hay fever relief Problems 19.14 and 19.15. 


23.5. 


*23.6. 


*23.7. 


а. 


оро т 


Modify regression model (23.11) to apply to this two-factor study уушта = 3 and р — 3 


. Set up the Y, X, and В matrices for the regression model in part (а). 


Obtain Xf. Verify the correctness of the expected values. 


. Obtain the titted regression function. What is estimated by à? 


Obtain the regression analysis of variance table based on appropriate extra sums of Squares, 
Do your results agree with those obtained using the ANOVA approach in Problem 191 5b? 
Test separately for interaction effects, factor A main effects, and factor B main effects, 
Use a = .05 for each test and state the alternatives, decision rule, and conclusion, 


Refer to Disk drive service Problems 19.16 and 19.17. 


Modify regression model (23.11) to apply to this two-factor study with a — 3 and b — 3, 
Obtain the fitted regression function. What is estimated by 21? 


c. Obtain the regression analysis of variance table based on appropriate extra sums of Squares, 


Do your results agree with those obtained using the ANOVA approach іп 19.17b? 


Test separately for interaction effects, factor A main effects, and factor B main effects. 
Use a = .01 for each test and state the alternatives. decision rule, and conclusion, 


Refer to Cash offers Problem 19.10. Suppose that observations Y», = 28 and Ууз = 20 are 
missing because the offer received in each of these cases was a trade-in offer. not a cash offer. 


а. 


State the ANOVA model for this case. Also state the equivalent regression model; use I, 
— |. 0 indicator variables. 


b. Present the X and В matrices for the regression model in part (a). 

c. Obtain Xf) and show that the proper treatment means are obtained by your model. 
d. 
e 


. Test whether or not interaction effects are present by fitting the full and reduced regression 


What is the reduced regression model for testing for interaction effects? 


models; use a = .05. State the alternatives. decision rule, and conclusion. What is the 
P-value of the test? 


. State the reduced regression models for testing for age and gender main effects, respectively, 


and conduct each of the tests. Use о = .05 each time and state the alternatives, decision 
rule, and conclusion. What is the P-value of each test? 
To study the nature of the age main effects, estimate the following pairwise comparisons: 
4 
Ру = ці. ~ Из. D> = Hi. — H3 Dı = pus. — их 


Use the most efficient multiple comparison procedure with a 90 percent family confidence 
coefficient. 
In the population of female owners. 30 percent are young, 60 percent are middle-aged, and 


10 percent are elderly. Estimate the mean cash offer for this population with a 95 percent 
confidence interval. 


Refer то Hay fever relief Problem 19.14 and 23.4. Suppose that observations Уніз = 2.3, 
Y», = 8.9, and Y», = 9.0 are missing because ће subjects did not immediately record the 
time when they began to suffer again from hay fever. 


23.8. 


23.9. 


Chapter 23  Two-Factor Studies with Unequal Sample Sizes 983 


a. State the ANOVA model for this case. Also state the equivalent regression model; use 1, 
—1, 0 indicator variables. 

. Present the X and В matrices for the regression model in part (a). 

. Obtain XB and show that the proper treatment means are obtained by your model. 

. What is the reduced regression model for testing for interaction effects? 


ona c 


. Test whether or not interaction effects are present by fitting the full and reduced regression 
models; use а = .05. State the alternatives, decision rule, and conclusion. What is the 
P-value of the test? How do your results compare with those obtained in 23.4f, where 
there is no missing data? 

f. The nature of the interaction effects is to be studied by means of the following contrasts: 


Шо + шз 

Lot д L,—lL,—l, 
Ha + Мэз 

Li = 78777 — um Ls = L; — Lı 
Ha + изз 

Із = I — ш Le = Із — Lo 


Obtain confidence intervals for these contrasts; use the Scheffé multiple comparison pro- 
cedure with a 90 percent family confidence coefficient. Interpret your findings. 

Refer to Kidney failure hospitalization Problem 19.18. Suppose that observations Үд = 12, 

Y216 = 2, and Узв = 9 are missing because the hospitalization records for these patients were 

not complete. Continue to work with the transformed data Y’ = log, (Y + 1). 

a. State the ANOVA model for this case. Also state the equivalent regression model; use 1, 
— 1, O indicator variables. 

. Present the X and В matrices for the regression model in part (а). 

. Obtain ХВ and show that the proper treatment means are obtained by your model. 

. What is the reduced regression model for testing for interaction effects? 

. Test whether or not interaction effects are present by fitting the full and reduced regression 
models; use œ = .05. State the alternatives, decision rule, and conclusion. What is the 
P-value of the test? 

f. State the reduced regression models for testing for treatment duration and weight gain 

main effects, respectively. Conduct each of the tests. Use a = .05 each time and state the 
alternatives, decision rule, and conclusion. What is the P-value of each test? 


o = о с 


g- Use the single degree of freedom г“ statistic for testing whether ог not the mean number 
of days hospitalized (in transformed units) for persons with mild weight gains exceeds .5; 
use œ = .05. State the alternatives, decision rule, and conclusion. What is the P-value of 
the test? 

h. То analyze the nature of the factor main effects, estimate the following pairwise 
comparisons: 


Ру = ш. = ш. Рз = из – ид 

Р = h2 — H- Da = ia — шә 
Use the Bonferroni procedure with a 90 percent family confidence coefficient. State your 
findings. 


Adjunct professors. A sociologist selected a random sample of 45 adjunct professors who 
teach in the evening division of a large metropolitan university for a study of special problems 
associated with teaching in the evening division. The data collected include the amount of 


984 PartFive Multi-Factor Studies 


23.10. 


payment received by the faculty member for teaching a course during the past semester, Th 
sociologist classified the faculty members by subject matter of course (factor A) and Біне 
degree earned (factor B). The earnings per course (іп thousand dollars) follow. М 


Factor B (highest degree) 


Factor A j=1 ј= 2 j=3 
(subject matter) Bachelor’s Master's Doctorate 

i = 1 Humanities 1.7 1.8 2.5 
1.9 2.1 2.7 

2.9 

i = 2 Social sciences 2.5 2.7 3.5 
2.3 2.4 3.3 

2.4 2.5 3.4 

i = 3 Engineering 2.7 2.9 3.7 
2.8 3.0 3.6 

2.7 3.9 

i 24 Management 2.5 2.3 3.3 
2.6 2.8 3.4 

3.6 


State the ANOVA model for this case. Also state the equivalent regression model; use 1, 
— |, O indicator variables. 


Present the X and B matrices for the regression model in part (а). 
Obtain Xf and show that the proper treatment means are obtained by your model. 


. Fit the equivalent regression model and obtain the residuals. Prepare aligned residual dot 


plots for the treatments. What are your findings? 

Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla- 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 


Refer to Adjunct professors Problem 23.9. Assume that ANOVA model (19.23), is appro- 
priate, except that now k = l,.... nij. 


a. 


Plot the estimated treatment means У,у. in the format of Figure 23.1. Does it appear that 


any factor effects are present? Explain. . 


What is the reduced regression model for testing for interaction effects? 

Test whether or not interaction effects are present by fitting the full and reduced regression 
models; use œ = .01. State the altematives, decision rule, and conclusion. What 15 the 
P-value of the test? 

State the reduced regression models for testing for subject matter and highest degree main 
effects, respectively, and conduct each of the tests. Use œ = .01 each time and state the 
alternatives, decision rule, and conclusion. What is the P-value of each test? 


. Make all pairwise comparisons between the subject matter means; use the Tukey procedure 


with a 95 percent family confidence coefficient. State your findings and present a graphic 
summary. 


23.11. 


*23.12. 


23.13. 


*23.14. 


Chapter 23  Two-Factor Studies with Unequal Sample Sizes 985 


f. Makeall pairwise comparisons between the highest degree means; use the Tukey procedure 
with a 95 percent family confidence coefficient. State your findings and present a graphic 
summary. 

Refer to Adjunct professors Problem 23.9. Suppose that the sociologist had prior information 

indicating that the two factors do not interact and that no-interaction model (23.28) is therefore 

appropriate. 

a. State the equivalent full regression model for this case. Also state the reduced regression 
models for testing for factor А and factor B main effects. Use 1, —1, O indicator variables. 

b. Fit the full and reduced regression models and test for factor A and factor B main effects; 
use о = .05 for each test. State the alternatives, decision rule, and conclusion for each test. 
What is the P-value of each test? 

Refer to Hay fever relief Problem 19.14. Suppose that the data for the treatment when each 

of the two active ingredients is at the medium level were lost and immediate analyses of the 

available data are required; i.e., assume that пт = 32 and nz = 0. 


a. To study whether or not interaction effects are present, estimate the following comparisons: 


D, = шз — Ил Lı = D,— D2 
D» = uz — un L = D; — D; 
D3 = u33 — Из 


Use the Bonferroni procedure with a 90 percent family confidence coefficient. State your 
findings. 

b. To further explore the nature of possible interaction effects, conduct separate single degree 
of freedom tests of whether и> = шз and whether изә = изз. Use œ = .02 for each 
test and state the alternatives, decision rule, and conclusion. What is the family level of 
significance, using the Bonferroni inequality? 

Refer to Kidney failure hospitalization Problem 19.18. Suppose that there were no patients 

who received the dialysis treatment for long duration and had mild weight gains; i.e., assume 

that ит = 50 and п) = 0. Continue to work with the transformed data Y’ = log (Y + 1). 
On the basis of related research, the analyst believes it is reasonable to assume that the two 

factors do not interact and that no-interaction model (23.28) is appropriate. 

a. State the equivalent full regression model for this case. Also state the reduced regression 
models for testing for factor A and factor B main effects. Use 1, —1, O indicator variables 
in the regression model. 

b. Fit the full and reduced regression models. Test for factor A and factor B main effects; use 
a = .05 for each test. State the alternatives, decision rule, and conclusion for each test. 
What is the P-value of each test? 

Referto Programmer requirements Problem 19.20. Suppose that there were no programmers 

with experience on both small and large systems who had less than five years’ experience; 

Le., assume that n7 = 20 and n2, = О. 


a. To study whether or not interaction effects are present, estimate the following comparisons: 
Dı = un-s Ш= р - Р 
Р, = 122 — H5 


Use the Bonferroni procedure with а 95 percent family confidence coefficient. State your 
findings. 


986 PartFive  Multi-Factor Studies 


23.15. 


*23.16. 


23.17. 


*23.18. 


23.19. 


b. To study lurther the nature of possible interaction effects, test whether or not 
A23: use а = .05. State the alternatives. decision rule. and conclusion. Why 
of the test? 


Ha exceeds 
atts the P-vatue 


Refer to Adjunct professors Problem 23.9. Suppose that there were no professors teachin 

humanities courses who had only a bachelor’s degree, so that the study consists of nz ae 
adjunct professors and пу = 0. On the basis of previous research, the sociologist believes 
it is reasonable to assume that the two factors do not interact and thar no-interaction mode] 
(23.28) is appropriate here. 


a. State the equivalent full regression model for this case. Also state the reduced regression 
models for testing for factor A and factor B main effects. Use I, —1. 0 indicator variables 
in the regression model. 


b. Fitthe full and reduced regression models and test for factor A and factor B main effects, 
Use а = .01 for each test and state the alternatives, decision rule, and conclusion. What is 
the P-value of each test? 

Refer to Auditor training Problem 21.5. 

4. State the regression model equivalent to randomized block model (21.1); use I,-—1,0 
indicator variables. 

b. Fit the regression model to the data. 

c. Obtain the regression analysis of variance table based on appropriate extra sums of 
squares, 

d. Test for treatment main effects: use о = .05. State the alternatives, decision rule, and 
conclusion. 

Refer to Fat in diets Problem 21.7. 

a. State the regression model equivalent to randomized block model (21.1); use 1, —1,0 
indicator variables. 

b. Fit the regression model to the data. 

c. Obtain the regression analysis of variance table based on appropriate extra sums of squares, 

d. Test for treatment main effects; use а = .05. State the alternatives. decision rule, and 
conclusion. 

Refer to Auditor training Problems 21.5 and 23.16. Assume that observation Ya; = 89 is 

missing because the auditor became ill and dropped out from the study. 

a. State the ANOVA model for this case. Also state the equivalent regression model; use 1, 
— |. 0 indicator variables. 

b. Statethe reduced regression model for testing for differences in the mean proficiency scores 
for the three training methods. 

c. Tesi whether or not the mean proficiency scores for the three training methods differ by 
fitting the full and reduced models; use о = .05. State the alternatives, decision rule, and 
conclusion. How do your results compare with those obtained in Problem 23.164, where 
there are no missing observations? E 

d. Compare the mean proficiency scores for training methods 2 and 3 by means of the regres- 
sion approach; use a 95 percent confidence interval. 


Refer to Fat in diets Problems 21.7 and 23.17. Assume that observations F = .15 and 
Ys; = 1.62 are missing because the subjects did not stay on the prescribed diet. 


a. State the ANOVA model for this case. Also state the equivalent regression model; use I, 
— |. 0 indicator variables. 


*23.20. 


23.21. 


23.22. 


*23.23. 


Chapter 23  Two-Factor Studies with Unequal Sample Sizes 987 


b. State the reduced regression model for testing for differences in the mean reductions in 
lipid level for the three diets. 

с. Test whether or not the mean reductions in lipid level differ for the three diets by fitting the 
full and reduced models; use а = .05. State the alternatives, decision rule, and conclusion. 
How do your results compare with those obtained in Problem 23.17d, where there are no 
missing observations? 

d. Compare the mean reductions in lipid level for diets 1 and 3 by means of the regression 
approach; use a 98 percent confidence interval. 


Refer to Cash offers Problem 19.10. It is known that in both populations of male and female 

owners, 30 percent are young, 60 percent are middle-aged, and 10 percent are elderly. Test 

by means of the single degree of freedom 1* test statistic whether or not the mean cash offers 

for male and female owners are equal; use œ = .05. State the alternatives, decision rule, and 

conclusion. What is the P-value of the test? 

Refer to Kidney failure hospitalization Problem 19.18. Continue to work with the trans- 

formed data Y' = log,)(Y + 1). It is known that 75 percent of patients in each weight gain 

group receive the short duration treatment. Inferences are desired about the target population 

of patients at the dialysis facility. 

a. Use cell means model (19.15) to express the two alternatives for testing whether or not 
factor B main effects are present in the form of (23.372). 

b. State the regression model equivalent to ANOVA model (19.15), using 1, O indicator 
variables. 

с. State the reduced regression model for testing for factor B main effects; express u, and 
[413 in terms of the other cell means. 

d. Fit the full and reduced regression models and test for factor B main effects; use о = .05. 
State the decision rule and conclusion. What is the P-value of the test? 

e. Compare the mean number of days of hospitalization (in transformed units) for patients 
with severe and mild weight gains; use a 95 percent confidence interval. 


Refer to Adjunct professors Problem 23.9. It is known that 10 percent of professors in each 
subject matter area have a bachelor's degree, 20 percent have a master's degree, and 70 percent 
have a doctorate. Inferences are desired about the target population of adjunct professors. 


a. Use cell means model (19.15) to express the two alternatives for testing whether or not 
factor A main effects are present in the form of (23.372). 

b. Define the X matrix and B vector for expressing full model (19.15) in matrix form for this 
case. 

c. Express the two alternatives in part (2) in matrix form (23.44). 

d. Use (23.48) to calculate SSE(R) — SSE(F). 

e. Test whether or not factor A main effects are present; use a = .01. State the decision rule 
and conclusion. What is the P-value of the test? 

f. Compare the mean amounts of payment received by faculty members teaching humani- 
ties and engineering courses; use a 99 percent confidence interval. Interpret your interval 
estimate. 


Refer to Programmer requirements Problem 19.20. Suppose that the observations Y;33 = 68, 
Yi34 = 58, and Узза = 45 did not exist and that the sample sizes reflect the importance of the 
treatment means. Test whether or not type of experience main effects are present; control the 
level of significance at a = .01. State the alternatives, decision rule, and conclusion. What is 
the P-value of the test? 


988 PartFive Multi-Factor Studies 


. Refer to Adjunct professors Problem 23.9. Assume that the sample sizes reflect the 


impor- 
Present; 
conclu- 


tance of the treatment means. Test whether or not subject matter main effects are 
control the level of significance at о = .05. State the alternatives, decision rule, and 
sion. What is the P-value of the test? 


Exercises 


ә 


Юю 
hz 


. Derive o?(£.) for the estimated contrast involving fi. in (23.22). 
. Show that s?(£.] in (23.26) is an unbiased estimator of o7{Z}. 


. Refer to regression model (23.31), the equivalent to ANOVA model (21.1) when ль 


=3 and 
r = 3. Suppose that the indicator variables in model (23.31) were coded as follows: j 


x= | if experimental unit from block | 
| 0 otherwise 


| ifexperimental unit from block 2 
0 otherwise 


| if experimental unit from treatment 1 
0 otherwise 


| ifexperimental unit from treatment 2 
Xy= к 
0 otherwise 


and that the regression coefficients are denoted by Во, В, P», Вз, and By. 


a. Exhibit the X matrix for this regression model. 

b. Find the correspondences between the regression coefficients Во, P,...., В; and the 
parameters in ANOVA model (21.1). 

c. Discuss the advantages and disadvantages of using I, 0 indicator variables and 1, —1, 
0 indicator variables here. 


. Consider a two-factor study where a = 2, b —2. ny = ti; = п = 2. n; = |, and 


no-interaction model (23.28) applies. Use the matrix methods in Section 23.5 to obtain the 
estimator of uz. [Hint: Begin with interaction model (23.3) as the full model, express the 
assumption of no interactions in the form of (23.44), and use (23.46) to obtain the estimator 
of un for the no-interaction model.] 

Refer to Kidney failure hospitalization Problem 23.13. Suppose that you are going to use 
the matrix approach in Section 23.5, rather than the regression approach, to test for factor А 
main effects. 


a. State the X and B matrices to be used in the full model. 
b. State the test hypothesis in matrix form (23.44). 


Projects 


ә 


ә 
о 


. Refer to the SENIC data set in Appendix С.І. The effects of region (factor A: variable 9) and 


average age of patients (factor B: variable 3) on mean length of hospital stay (variable 2) are 

to be studied. For purposes of this ANOVA study, average age is to be classified into three 

categories: under 52.0 years, 52.0—under 55.0 years, 55.0 years or more. 

a. State the ANOVA model for this case. Also state the equivalent regression model; use 
1. —I, 0 indicator variables. 

b. Fit the regression model, obtain the residuals, and prepare aligned residual dot plots for 
the treatments. What are your findings? 


" 


23.31. 


23.32. 


23.33. 


c. 


Chapter 23 Two-Factor Studies with Unequal Sample Sizes 989 


Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla- 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 


Refer to the SENIC data set in Appendix C.1 and Project 23.30. Assume that ANOVA model 
(19.23), with k = 1, ... , ijj, is appropriate. 


a. 


b. 


Plot the estimated treatment means Y. in the format of Figure 23.1. Does it appear that 
any factor effects are present? Explain. 

State the reduced regression model for testing for interaction effects. 

Fit the reduced regression model and test whether or not interaction effects are present; 
use œ = .01. State the alternatives, decision rule, and conclusion. What is the P-value of 
the test? 


. State the reduced regression model for testing for factor A main effects. Conduct this test 


using о = .01. State the alternatives, decision rule, and conclusion. What is the P-value 
of the test? 


. State the reduced regression model for testing for factor B main effects. Conduct this test 


using œ = .01. State the alternatives, decision rule, and conclusion. What is the P-value 
of the test? 


. Make all pairwise comparisons between regions; use the Tukey procedure and a 95 percent 


family confidence coefficient. State your findings and present a graphic summary. 


Refer to the CDI data set in Appendix C.2. The effects of region (factor A: variable 17) and 
percent below poverty level (factor B: variable 13) on the crime rate (variable 10+ variable 5) 
are to be studied. For purposes of this ANOVA study, percent below poverty level is to be 
classified into three categories: under 6.0 percent, 6.0—under 10.0 percent, 10.0 percent or 
more. 


a. 


b. 


State the ANOVA model for this case. Also state the equivalent regression model; use 
1, —1, 0 indicator variables. 

Fit the regression model, obtain the residuals, and prepare aligned residual dot plots for 
the treatments. What are your findings? 


. Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla- 


tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 


Refer to the CDI data set in Appendix C.2 and Project 23.32. Assume that ANOVA model 
(19.23), with k = 1, ..., nij, is appropriate. 


a. 


b. 
c. 


Plot the estimated treatment means Үү. in ће format of Figure 23.1. Does it appear that 
any factor effects are present? Explain. 

State the reduced regression model for testing for interaction effects. 

Fit the reduced regression model and test whether or not interaction effects are present; 
use œ = .005. State the alternatives, decision rule, and conclusion. What is the P-value of 
the test? 


. State the reduced regression model for testing for factor A main effects. Conduct this test 


using о = .005. State the alternatives, decision rule, and conclusion. What is the P-value 
of the test? 


. State the reduced regression model for testing for factor B main effects. Conduct this test 


using œ = .005. State the alternatives, decision rule, and conclusion. What is the P-value 
of the test? 


990 PartFive Multi-Foctor Studies 


N 
D 
Lo 
a 


f. Make all pairwise comparisons berween regions: use the Tukey procedure and а 95 percent 
family confidence coefficient. State your findings and present a graphic summary, 


. Refer to the Market share data set in Appendix C.3. The effects of discount price (factor А: 


variable 5) and package promotion (factor B: variable 6) on market share (variable 2) are to 

be studied. 

а. State the ANOVA model for this case. Also state the equivalent regression model: ие 
1. —1. 0 indicator variables. P 

b. Fit the regression model. obtain the residuals. and prepare aligned residual dot plots for 
the treatments. What are your findings? 

c. Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla- 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 


. Refer to the Market share data set in Appendix С.З and Project 23.34. Assume that ANOVA 


model (19.23) with К = 1..... п is appropriate. 


a. Plot the estimated treatment means У;;. in the format of Figure 23.1. Does it appear that 
any factor effects are present? Explain. 

b. State the reduced model for testing for interaction effects. 

c. Fit the reduced regression model and test whether or not interaction effects are present; 
use а = .05. State the alternatives, decision гше, and conclusion. What is the P-value of 
the test? 

d. State the reduced regression model for testing for factor A main effects. Conduct this test 
using œ = .05. State the alternatives, decision rule. and conclusion. What is the P-value 
of the test? 

e. State the reduced regression model for testing for factor B main effects. Conduct this test 
using œ = .05. State the alternatives. decision rule, and conclusion. What is the P-value 
of the test? 


. Refer to the SENIC data set in Appendix C.I and Projects 23.30 and 23.31. Assume that the 


sample sizes reflect the importance of the treatment means. 
a. Test for region (factor A) main effects: use a = .01. State the alternatives, decision rule, 
and conclusion. What is the P-value of the test? 


b. Test for average age of patients (factor B) main effects; use a = .01. State the alternatives, 
decision rule, and conclusion. What is the P-value of the test? 


. Refer to the CDI data set in Appendix C.2 and Projects 23.32 and 23.33. Assume that the 


sample sizes reflect the importance of the treatment means. 
a. Test for region (factor A) main effects: use о = .005. State the alternatives. decision rule, 
and conclusion. What is the P-value of the test? 


b. Test for percent below poverty level (factor B) main effects; use a = .005. State the , 
alternatives. decision rule, and conclusion. What is the P-value of the test? 


Case 
Studies 


23.38. 


Refer to the Prostate cancer data set in Appendix C.5. Assume that the sample sizes do not 
reflect the importance of the treatment means. Carry out an unbalanced two-way analysis of 
variance of this data set, where the response of interest is PSA level (variable 2), the two 
crossed factors are Gleason score (variable 9) and seminal vesicle invasion (variable 7). The 
analysis should consider transformations of the response variable. Document the steps taken 
in your analysis and justify your conclusions. 


23.39. 


23.40. 


23.41. 


23.42. 


23.43. 


Chapter 23 Two-Factor Studies with Unequal Sample Sizes 991 


Refer to the Prostate cancer data set in Appendix C.5 and Case Study 23.38. Assume that the 
sample sizes reflect the importance of the treatment means. Carry out an unbalanced two-way 
analysis of variance of this data set, where the response of interest is PSA level (variable 2), the 
two crossed factors are Gleason score (variable 9) and seminal vesicle invasion (variable 7). 
The analysis should consider transformations of the response variable. Document the steps 
taken in your analysis and justify your conclusions. 

Refer to the Real estate sales data set in Appendix C.7. Assume that the sample sizes do not 
reflect the importance of the treatment means. Carry out an unbalanced two-way analysis of 
variance of this data set, where the response of interest is sales price (variable 2), the two 
crossed factors are quality (variable 10) and style (variable 11). Recode style as 1 or not 1. 
The analysis should consider transformations of the response variable. Document the steps 
taken in your analysis and justify your conclusions. 

Referto the Real estate sales data set in Appendix C.7 and Case Study 23.40. Assume that the 
sample sizes reflect the importance of the treatment means. Carry out an unbalanced two-way 
analysis of variance of this data set, where the response of interest is sales price (variable 2), 
the two crossed factors are quality (variable 10) and style (variable 11). Recode style as 1 or 
not 1. The analysis should consider transformations of the response variable. Document the 
steps taken in your analysis and justify your conclusions. 

Refer to the Ischemic heart disease data set in Appendix C.9. Assume that the sample sizes 
do not reflect the importance of the treatment means. Carry out an unbalanced two-way 
analysis of variance of this data set, where the response of interest is total cost (variable 2), 
the two crossed factors are number of interventions (variable 5) and number of comorbidities 
(variable 9). Recode the number of interventions into six categories: О, 1, 2, 3-4, 5-7, and 
greater than or equal to 8. Recode the number of comorbidities into two categories: 0—1, 
and greater than or equal to 2. The analysis should consider transformations of the response 
variable. Document the steps taken in your analysis and justify your conclusions. 

Refer to the Ischemic heart disease data set in Appendix C.9 and Case Study 23.42. Assume 
that the sample sizes reflect the importance of the treatment means. Carry out an unbalanced 
two-way analysis of variance of this data set, where the response of interest is total cost 
(variable 2), the two crossed factors are number of interventions (variable 5) and number of 
comorbidities (variable 9). Recode the number of interventions into six categories: О, 1, 2, 3-4, 
5—7, and greater than or equal to 8. Recode the number of comorbidities into two categories: 
0—1, and greater than or equal to 2. The analysis should consider transformations of the 
response variable. Document the steps taken in your analysis and justify your conclusions. 


Chapter ^ > / | 


Multi-Factor Studies 


When three or more factors are studied simultaneously. the model and analysis employed 
are straightforward extensions of the two-factor case. We shall illustrate the nature of the 
extensions with reference to three-factor studies. Ordinarily, computer ANOVA packages 
will be utilized for performing the needed calculations for multi-factor studies involving 
three or more factors. For completeness, however, we shall present the necessary defini- 
tional formulas for three-factor studies. The ANOVA model with fixed factor levels when 
all treatment sample sizes are equal and all treatment means are of equal importance is 
considered in Sections 24.1-24.5. Then the analysis of variance with unequal sample sizes 
is taken up in Section 24.6. The chapter concludes with the planning of sample sizes for 
multi-factor studies. 


24.1 ANOVA Model for Three-Factor Studies 


Notation 


992 


We now turn to the development of the ANOVA model with fixed factor levels for three- 
factor studies. This ANOVA model will be applicable to observational studies and to ex- 
perimental studies based on a completely randomized design. 


Three factors, A, B, and C, are investigated at a, b, and c levels, respectively. The mean 
response for the treatment when factor A is at the ith level (г = I, ..., а), factor B is at the 
jth level (j = 1,..., b), and factor C is at the kth level (k = I, ..., c) is denoted by ш. 
The number of cases for each treatment is assumed to be constant, denoted by л. We assume 
n > 2. The mean response when A is at the ith level and B is at the jth level is denoted by 
Li» and similar notation is used for other pairs of factor levels. Since all treatment means 


are assumed to have equal importance, we define: * 
му. = 26006 (2412) 
d c 
hrs Dy Hur (24.1b) 
b 
io Da Hijr (24.10) 
a 


Illustration 


Main Effects 


Chapter 24  Multi-Factor Studies 993 


The mean response when А is at the ith level is denoted by џ;.., and similar notation is 
used for the other factor level means. We define: 


DT 


4.2 
a (24.2a) 


Hi- 


= Laden Mit (24.2b) 

ac 
22242 Maye 
i ab 


Hj. 


H-k (24.2c) 


Finally, the overall mean response is denoted by u... and is defined: 


= di >; У, Lijk 
u- m MM 


abc 


(24.3) 


To illustrate the meaning of the model terms for a three-factor analysis of variance model, 
we consider a study of the effects of gender, age, and intelligence level of college gradu- 
ates on learning time for a complex task. Gender is factor А and has a = 2 levels (male, 
female). Age is factor B and is defined in terms of b — 3 levels (young, middle, old). 
Finally, intelligence is factor C and is defined in terms of c — 2 levels (high IQ, normal IQ). 
Table 24.1a shows the treatment means 4j. for all factor level combinations, as well as 
the notational representation for each. Also shown in Table 24.1a are the various means of 
the шк. Shown in Table 24.1b are various ANOVA model parameters that were computed 
from the treatment means in ‘Table 24.1a. We shall refer repeatedly to this learning time 
example as we explain the model terms for a three-factor study. 


The main effects in a three-factor study are defined analogously to those for a two-factor 
study. Thus, the main effect of the ith level of factor A is defined: 


о = s. — Ш... (24.4a) 


Similarly, we define the main effect of the jth level of factor B: 


Bj = hje = ш. (24.4b) 
and the main effect of the kth level of factor C: 
Yk = Ш.к — Ш... (24.4с) 


For learning time example 1 in Table 24.1, we have, for instance: 
ол = ш. — u... = 16.5 — 16 = .5 
By = шл. — ш... = 14-16 = —2 
Во = шэ. — ш... = 15.5 – 16 = —.5 
и = Hep — ш... = 12 — 16 = —4 


8 
Ф 


TABLE 24.1 Mean Learning Times and ANOVA Model Parameters —Learning Time Example 1. 
у (а) Mean Learning Times (in minutes) 


Intelligence (factor C) and Age (factor B) 


Factor k = 1 High IQ k = 2 Normal IQ Average 

Д—= j=1 j=2 j=3 j=l Д=е?. J=3 j=l ў=2 Js3 

Gender Young Middle Ой Average Young Middle Old Average Young Middle Old Average 

[21 12 18 13 19 20 21 20 14 16 19.5 16.5 

Male (ил) (шат) Gat) | (ma) (un2) (m22) (22! Q2) (m) (m2) (m3) (ил...) 

i22 9 10 14 11 19 20 21 20 . 14 15 17.5 15.5 

Female — (uou) (ит) — (u23)| (игл) (4212) (Мә) (232) | (из) (ил) (m22) — (429) | (92) 

Average 9 11 16 12 19 . 20 21 20 14 15.5 18.5 16 

(ua) (ma) (m31) (ил) (432) (m22) (изә) (и..2) (ш.л.) (4.2) (из.) (ш...) 
(b) ANOVA Model Parameters 

pe = 16.0 fi = —2.0 n= —4.0 (aB)12 = 0.0 (Ву) = 1.0 («Вуу1 = —.5 

а= 5 Вә = —5 («бул = —5 (ayhi = .5 (By): = —.5 (aBy)i21 = 0.0 


Chapter 24 Multi-Factor Studies 995 


These parameters are shown in Table 24.1b. It follows from the definitions in (24.4) that 
the sums of the main effects are zero: 


Xas pY on =0 (24.5) 
n k 


i 


For example, since a; +0 = 0, it follows that oz; = —o, = —.5; Вз and yz can be obtained 
in similar fashion. Since all main effects terms are nonzero, we know that all three main 
effects are present here. 


Two-Factor Interactions 
The two-factor interaction effects in a three-factor study are defined in the same fashion 
as for a two-factor study, except that all means are averaged over the third factor. Thus, 
following (19.8a) we défine the two-factor interaction between factor A at the ith level and 
factor B at the jth level, denoted as before by (o/f);;, as follows: 


(oB)ij = Hij — Hi- — Ш.у. + u... (24.6a) 
In corresponding fashion, we define the AC and BC two-factor interactions: 


(оу) = Hik — Hi- — H-k Be (24.6b) 
(BY) jk = ш.к — h-j- — H-k B (24.6с) 


For learning time example 1 in Table 24.1, we have for instance: 


(a8), —14—16.5—14 +16= —.5 
(08) 12 = 16—16.5—15.5-.-16 = 0.0 
(кү) = 13 – 16.5 – 12 +16= .5 
(By) = 9—14 – 12 +16= –1.0 
(By) = 11 – 15.5 – 12 +16= —.5 


These parameters are shown in Table 24.10. 

The two-factor interactions (@B);;, (оу), and (бу) are often called first-order in- 
teractions. Yt can readily be shown that the sums of the first-order interactions over each 
subscript are zero: 


Y (eB =0  foalj  YXofy-—0 foralli (24.7а) 
i j 

Уу (он = 0 for all k У 57 = 0 for all i (24.7b) 
i k 

УХВу)к =0  foalk — Y 8p -—0  foalj (2470 
Jj k 


All two-factor interaction terms not listed in Table 24.1b can be obtained from the five terms 
listed and the sum-to-zero expressions in (24.7). Since nonzero (068), ;, (Œy Jir, and (Ву) jx 
terms are present, we know that all three two-factor interactions, AB, AC, and BC, exist. 


996 PartFive Multi-Factor Studies 


Three-Factor Interactions 


Just as in a two-factor study, where the interaction between the ith level of factor А and the 
jth level of factor B is defined as the difference between the treatment mean Hij and the 
value that would be expected if the factor effects were additive, so in a three-factor stud: 
the three-factor interaction (еу) is defined as the difference between the treatment ich 
шк and the value that would be expected if main effects and first-order interactions Were 
sufficient to account for all factor effects. The value that would be expected from main 
effects and first-order interactions when A is at the ith level, B at the jth level, and С atthe 
kth level is: 


H- + о + B; + ук + («у + (оу): + (BY) jx (24.8) 


Hence, the three-factor interaction (ef y ук, also called the second-order interaction is 
? 
defined as: 


(«Ву к = Hijk — |н... + 05 + В; + ys + (98); + (еу) + (Ву) jx] (24.92) 
or equivalently: 
(By к = Hijk — Hij- — Hik — H-jk + Hie + Heg Bk 7 М... (24.9b) 


From the definition of the three-factor interactions, it follows that they sum to zero when 
added over any index: 


У (обу) =0 У ову) =0 У (ойбу) =0 
1 П k 


j (24.10) 
for all j, k for all i, k for all i, j 


If all three-factor interactions (ey );j, are zero, we say that there are no three-factor 
interactions among factors A, B, and С. If some (обу): are not zero, we say that three- 
factor interactions are present. 

Let us find the three-factor interaction (бу) for the learning time example in 
Table 24.1. From (24.92), we have fori = j =k = 1: 


(By) = ши — ш... + о + Br уз + (08). + (еу) + (BY) 
Using the ANOVA model parameter values from Table 24. 1b, we obtain: 
(eBy)iny =9- (164+ .5—2-4-.54+.5-1)=-.5 


Since (ову) 111 is not zero, we know at once that three-factor interactions are present m this 
4 
example. 


Cell Means Model 


Let Ук denote the observation for the mth case or trial (jm = 1,..., n) for the treatment 
consisting of the ith level of A (i = I, ..., a), the jth level of B (j = I,..., b), and the 
kth level of C (k = I, ..., c). Thus, the total number of cases in the study is: 


пт — nabc (24.11) 


Chapter 24  Multi-Factor Studies 997 


The ANOVA model for a three-factor study in terms of the cell (treatment) means Шук 
with fixed factor levels is: 


Ү; т == Шајк + Eijkm (24.12) 
where: 
Шук are parameters 
Eijkm are independent N (0, o?) 
pmlo2s;apeli;bk-ligcóm-el. 


Factor Effects Model 


..,n 


An equivalent factor effects model can be developed that incorporates the factorial structure 


by expressing each treatment mean ш; in terms of the various factor effects. From the 
three-factor interaction definition (24.92), we have the identity: 


Lijk = и... + о + By + ук + (08), + (ra + (Gy) + COBY ix 


(24.13) 
where: 
u. = 2222 
abc 
Qi = Hi-. — Ш... 
Bj = ш.) — Me 
Yk = Ш.к Me 
(«В = Hij- — Mi- — Ш.р. + Шш... 
(ау) = Hi-k — Wie — Weg + М... 
(BY) к = Me jk — Maj. — Mek + ш... 
(BY )ijx = Wijk — Hij- — Hik — H-jk + Hi- + Bg + Hok Ш... 
Hence, the equivalent factor effects ANOVA model for a three-factor study is: 
Үрп = ш... + 0 + By + ук + (oB)ig + (ау) + (BY) jk + («бук + ёт (24.14) 
where: 


Eijkm are independent N (0, o?) 


оз, Bj. Ver (В), (CY ix» (BY) jx» (@BY)ijx are constants subject to the restrictions: 


Yow = Ув = n= 
j k 


У (обуу = X 08); = У (оу) =0 
і j i 

Senn = Brr 0-9 
k i k 


Xbre = У («бук = У BY hj = 0 


М n k 


4 


998 Рагї Нуе Multi-Factor Studies 


Both the cell means model (24.12) and the equivalent factor effects model (24.1 
linear models, just as in the two-factor case. We shall illustrate this for an example | 
the chapter. 


4) are 
ater in 


24.2 Interpretation of Interactions in Three-Factor Studies 


To shed light on the nature of interactions in three-factor studies, we shall examine three 
variations of the learning time example by means of tables and graphs. The first example 
corresponds to learning time example 1, in which—as we have already determined. three- 
factor interaction 15 present. In learning time example 2, there is no three-factor interaction 
but two two-factor interactions are present. Finally, in learning time example 3, there ;s again 
no three-factor interaction but there is just one two-factor interaction. In each example, we 
present the true treatment means ju;;, and the true ANOVA model parameters. 


Learning Time Example 1: Interpretation of Three-Factor Interactions 


FIGURE 24.1 
Cell Means 
Plot with ABC 
Interaction 
Present— 
Learning Time 
Example 1. 


In a three-factor study, the presence of a three-factor interaction indicates that responses 
must be explained in terms of the combined effects of all three factors. Thus, no simplified 
explanation, for example in terms of main effects or first-order interactions, is possible, Any 
graphical presentation of cell means should display all of the individual cell means Lijk- 
A convenient way to do so is to create separate two-factor treatment means or interaction 
plots for each level of a third factor. For example, the AB treatment means plots for the two 
levels of factor C are displayed in Figure 24.1 for the cell means in Table 24.1. Recall that 
the learning time example considers the effects of gender (factor A), age (factor B), and 
intelligence (factor C) on learning time. Specifically, Figure 24.1 shows that for persons 
with normal IQ, gender has no effect on mean learning time, and age has only a small effect 
leading to slightly longer learning times for older persons. For persons with high IQ, on 
the other hand, females tend to learn more quickly than males for older persons but not for 
young persons, and older persons tend to require substantially longer learning times than 
young persons. 

Notice that the slopes of the curves in the AB cell means plots are not the same for the 
two levels of C. For the first level of C, the curves for middle-aged and older subjects are 


(a) AB Plot for C, (High IQ) (b) AB Plot for C; (Normal IQ) 
25 25L 83 (Old) B, (Middle) 
B (Old) -—— ae 
E В, (Middle) E аа еа 
2 15 2 15 
= = B, (Young) 


1 Mi ыз: 
А А; A A 
(Male) (Female) (Male) (Fernale) 


Chapter 24  Multi-Factor Studies 999 


sloping downward, while these curves both have zero slope for the second level of C. This 
lack of parallelism in the two plots will always be present if a three-factor interaction exists, 
but this is not the only way such slope changes can arise. As we will see in the next example, 
if an AB interaction is present and either A or B also interacts with C, lack of parallelism 
will also be present when the AB interaction is displayed for each level of C. 

If three-factor interactions are difficult to understand, higher-order interactions such as 
four-factor interactions in studies involving more than three factors are yet more abstruse. 
Fortunately, it is often found in practice that these higher-order interactions are quite small 
or nonexistent. When this is the case, they can be disregarded in the analysis of factor 
effects. 


‘Learning Time Example 2: Interpretation of Multiple Two-Factor Interactions 
| The set up for learning time example 2 is the same as that for learning time example 1— 
that is, we consider the same study of the effects of gender, age, and intelligence level on 
learning of a complex task— but the true cell means have changed. Table 24.2 lists the cell 
means and the corresponding ANOVA model parameters for learning time example 2. 
It is easy to see from a review of these parameters that all ABC interaction terms (af y );j; 
and all BC interaction terms (Ву) к are zero; however, AB and AC interactions are present, 
since (#8), = —.5 and (ау) = .5. 
Figures 24.2a and 24.2b display the AB interactions for the two levels of C. The lack 
of parallelism of the AB curves within each panel reflects the presence of AB interactions. 
Notice also that the slopes of the curves in Figure 24.2a for high IQ subjects are negative, 
while those in Figure 24.2b for normal IQ subjects are all close to zero. The fact that the 
AB curves for a given level of factor B are not parallel for the two levels of factor C reflects 
the presence of AC interactions in this example. The AC treatment means plots are shown 
in Figures 24.2c—e for each of the three levels of factor B. As expected, the AC curves 
in each panel are not parallel. Note finally that the slopes of the AC curves change from 
panel to panel. This lack of parallelism reflects the presence of the AB interaction in this 
example. 


TABLE 24.2 Mean Learning Times and ANOVA Model Parameters— Learning Time Example 2. 


(a) Mean Learning Times (in minutes) 
Intelligence (factor C) and Age (factor B) 
k=1 High IQ | k = 2 Normal IQ 


Young Middle Old Young Middle Old 
10.5 12.5 16 17.5 19.5 23 
9.5 10.5: 13 18.5. 19.5 22 


(b) ANOVA Model Parameters 


-2.0 у= —40 — (ofha—00 (Pyn =00 (аву) = 0.0 
—.5 (of) —.5 (оу) = .5 (Ву)л = 0.0 (eBy)iza = 0.0 


1000 PartFive  Multi-Factor Studies 


FIGURE 24.2 Cell Means Plots with AB and AC Interactions Present—Learning Time Example 2, 


(a) AB Plot for C, (High IQ) (b) AB Plot for C; (Normal IQ) 
Вз (Old) 


25 25 в, (Middle) 
в, (Old) ——— dj 


GE MEE 


Bı (Young) 


B; (Middle) 


Minutes 
a 
| 
Minutes 
a 


5 В, (Young) | 
A 45 
(Male) (Female) 
(c) AC Plot for В, (Young) (d) AC Plot for B; (Middle) (е) AC Plot for B, (Old) 
25 25 25 C (Normal IQ) 
C (Normal IQ) 


C (Normal IQ) 


E 8 8 
5 š$ | =~ = C; (High IQ) 


2 


G (High Ig) C; (High IQ) 


А А, Ay А; A А, 
(Мае) (Female) (Male) (Female) (Male) (Female) 


Learning Time Example 3: Interpretation of a Single Two-Factor Interaction 


Cell means and corresponding ANOVA model parameters for learning time example 3 are 
given in Tables 24.3a and 24.3b, respectively. The set upis again the same as that for learning 
time examples | and 2, however the cell means have changed. Note from Table 24.3b, that 
all parameters corresponding to the ABC interaction are zero, as are those corresponding to 
AC and BC. The two-factor interaction AB is present, since (08), = —.5. 

Figure 24.3a and 24.3b display the AB treatment means plots for each level of C. The 
slopes of the curves within each panel are not parallel, reflecting the presence of an AB 
interaction. Note also that the AB plots in Figure 24.3a are identical to those in Figure 24.3b, 
except that the cell means plotted in Figure 24.3b have been uniformly shifted up by eight 
minutes. This reflects the absence of the AC, BC, and ABC interactions in this example. 

Since the curves in the two AB plots are identical for the different levels of factor С 
except for the vertical displacement (i.e., since no AC, BC, or ABC interactions are present) 
separate panels are not necessary for interpreting the AB interaction. The overall AB cell 
means plot displays the cell means иу. when averaged over the levels of C. This plot 
is shown in Figure 24.3c. Notice that the slopes in the plot are identical to those in Fig- 
ures 24.3a and 24.3b. The џ;;. values plotted are the averages of the corresponding cell 


Chapter 24  Multi-Factor Studies 1001 


ü " vis 3 Mean Learning Times and ANOVA Model Parameters— Learning Time Example 3. 


(а) Меап Learning Times (in minutés) 
Intelligence (factór C ) and Age (factor B) 
k = 2 Normal IQ 


k= 1 High IQ 
j^1 j=2 j=3 
Young . Middle Old 
18 20 23.5 
18 19 21.5 


` (b) ANOVA Model Parameters 


и=-40  (@fh2=0.0 = (By) =0.0  (обузп=00 
(of): = —.5 (oy) = 0.0 (Ву)л = 0.0 (eBy)i21 = 0.0 


i 


| 


FIGURE 24.3 (а) AB Plot for C, (High IQ) (b) AB Plot for C; (Normal IQ) 
Get Means B; (Old) 
Plots With AB 25 25 В, (Middle) 
_ ateraction 
> resent— A В; (Old) " 
Yearning Time Ë T В, (Middle) £ 15 > 7 $ 
наш: 3. = $ B, (Young) 
5 B, (Young) 
^ А› ^ A2 
(Male) (Female) (Male) (Female) 
(c) Overall AB Plot (d) Main Effects Plot for C 
25 25 
B; (Old) 
B, (Middle) 
$ g 
2 15 5 15 
И а заь 5 
B, (Young) 


А С C 
(Male) (Female) (High IQ) (Normal IQ) 


1002 PartFive Multi-Factor Studies 


means ш; and џи; in Figures 24.3a and 24.3b. Because factor C is present as a main effect 
and does not interact with either A or B, (yi = —4), its effect can be shown and interpreted 
separately, using a bar graph, a main effects plot, or a line plot. A main effects plot for the 
factor C effect is shown in Figure 24.3d. 


Comment 


One way to determine whether or not a three-factor interaction exists is to plot differences of treatment 
means in а manner similar to two-factor interaction plots, as proposed in Reference 24.1. Jt can be 
shown that if a three-factor interaction is not present, then the differences between means with respect 
to any one of the factors will lead to parallel curves in the interaction plot of the differences. Conversely, 
if a three-factor interaction is present, the difference curves will not be parallel. For instance, in a three- 
factor study where the third factor is at two levels (such as in the learning time example) we would 
examine the differences ид — уо for all i and j. If the AB-interaction plots for these differences 
show parallel curves, then no three-factor interactions are present. We refer to this plot as a treatment 
means differences plot. (If the third factor has c > 2 levels, c — 1 interaction plots of the differences 
Lijk — Айда for k = 1,...,c — 1 are constructed, and lack of parallelism in any one of the plots 
would indicate the presence of a three-factor interaction.) 

"Treatment means differences plots are shown for learning time examples 1 and 2 in Figures 24.4a 
and 24.4b, respectively. We see from Figure 24.4a that the difference curves are not parallel, indicating 
the presence of a three-factor interaction. On the other hand, there is no three-factor interaction for 
learning time example 2, and this is reflected by the parallelism of the three curves in Figure 24.4b. 
For this example, these curves happen to be identical. The curves in the plot have been jittered slightly 
so that all three curves can be seen. 

Note that the main purpose of the treatment means differences plot is to diagnose the presence 
or absence of a three-factor interaction, and beyond this it does not contribute substantially to the 
interpretation of results. For this reason we do not advocate routine use of this plot with estimated 
treatment means. We shall employ analysis of variance techniques in Section 24.3 to identify which 
interactions are present, and then display appropriate treatment means plots or main effects plots to 
summarize and interpret results. a 


FIGURE 24.4 Treatment Means Differences Plot—Learning Time Examples 1 and 2. 


Difference 


oo i 


A 
(Male) 


Present—Learning Time Example 1 


(а) (b) 
Three-factor Interaction Three-factor Interaction 
Absent—Learning Time Example 2 


0 * 


B; (Old) -2 


В, (Middle) -6F 830010) 


Difference 


в, (Middle) р young) 


B, (Young) 


Ay А 
(Female) (Male) (Female) 


94.3 Fitting of ANOVA Model 


Chapter 24  Multi-Factor Studies 1003 


Notation 


The notation for sample totals and means is a straightforward extension of that for two- 
factor studies. As usual, a dot in the subscript indicates aggregation or averaging over the 


index represented by the dot. We have: 


Үр. = 5 Yijum 

Ү;.. = УУ; ИТ 
k m 

Yin. = > 5 Yijkm 
j m 

Y. jx. = Ууу. Yijun 
i m 

Yiu = дм 
j k m 

Xj. = pr» 
i k m 

Ya. = bao po Yijkm 
i j m 

K. = УУУУ и 
i j k m 


ч! 
| 


(24.15а) 


(24.15b) 


(24.15c) 


(24.15d) 


(24.15e) 


(24.15f) 


(24.159) 


(24.15) 


Later in this section we illustrate this notation for a study of the effects of gender, 
body fat, and smoking history on exercise tolerance in stress testing. Each of the three 
factors has two levels, and there are three replications for each treatment. Tables 24.4a 
and b show, respectively, the data and estimated means, together with the corresponding 


notation. 


Fitting of ANOVA Model 


When the normal error cell means model (24.12) is fitted by the method of least squares 
or the method of maximum likelihood, the estimators as usual turn out to be the estimated 


treatment means: 


Ёар = Yin. 


(24.16) 


1004 PartFive  Multi-Factor Studies 


TABLE 24.4 


Sample Data (а) Data 
and Estimated ROAST 
Treatment and "mE е ducal) 
Factor Level k= k=2 
Means for ‘Light :Heavy 
Three-Factor j=1 Low fat: : 
Study—Stress {=1 Male 244 (Ynii) 17.6 (Yna) 
Test Example. 29.2 (Yiii2) 18.8 (Yi122) 
24.6 (Yina) 23.2 (Yi23) 
i =2 Female 20.0 (Ya) 14.8 Yaz) 
21,9. (Y2112) 10.3 (22) 
17.6 (Yai13) 11.3 (Yzi23) 
j = 2 High fat: n | 
i = 1 Male 14,6 (Ya) 14.9 (Y1221) 
15.3 (Yiziz) 20.4 (Yizz2) 
12.3 (Yiz13) 12.8 (Yizz3) 
i = 2-Female 16.1 (Yzz) 10.1 (Ут) 
9.3 (Y252) 14.4. (Yz222) 
10:8: (Yzz3) 6.1 (Yo223) 
(b) Estimated Means 
k=1 ГЕРИ АШК 
j=l: T | К 
i-i 25.97 (Ysni.). 19.87 (Y ivz:) 22.92 (Y..) 
і=2  1983(Yn.) 124322) 15.98 (Үл..) 
Alli 22.90 (Y.:1.) 16:00 (Y.12.) 19.45 (Y...) 
] ze | = 
i= 14.07 (Үл) 16.03 (Y122.) 15.05 (Y12..) 
Alli 13.07 (Y.n.) 13:12 (7.22) 13.09.(Y.2..) 
All. j: | : 
i= 20.02 (Y 1.1.) 17.95 (Ү1.2.) 18:98 (Yr). 
i=2 15.95 (Үз.1.) ~ 11.17 (Y2.2.) 13.56 (Ү;...)- 
All i 17.98 (Y...) 14.56 (Y...) 16.27 (Y...) 


Thus, the fitted values for the observations are the estimated treatment 
Yikm = Ўр. 


and the residuals are the deviations of the observed values from the 
means: 


€ijkm = Lijkm T Lijkm — Tijkm T НЧ} 


Chapter 24  Multi-Factor Studies 1005 


Forthe equivalent factor effects model (24.14), the least squares and maximum likelihood 
estimators of the parameters are as follows: 


Parameter Estimator 

pew two 28,99) 
о âi S Yi... + Y.... (24.19b) 
Bi Bj Y... + Y... (24.19c) 
yk fe Vente + Y (24.19d) 
(об) @ = Yije + Yiee Y. Y... (24.19е) 
(ку) Vin = Vink Yi + Ёл Y. (24.19f) 
(Ву) jx (BY) jx = Y-i + Yaj + Yo + Ў... (24.199) 
(е8) (BY) ij = Yije — Yije — Y iae — Ye Yi Ye + = Y (24.19h) 


The fitted values and residuals for factor effects model (24.14) are the same as those in 
(24.17) and (24.18) for cell means model (24.12), as was the case for two-factor studies. 


Evaluation of Appropriateness of ANOVA Model 


Example 


No new problems arise in examining the appropriateness of the three-factor analysis of 
variance model. The residuals (24.18): 


Cijkm = Vijkm — Үш. (24.20) 


may be examined for normality, constancy of error variance, and independence of error 
terms in the same fashion as for single-factor and two-factor studies. 

Weighted least squares as usual is a standard remedial measure when the error variance is 
not constant but the distribution ofthe error terms is normal. A transformation ofthe response 
variable may be helpful to stabilize the error variance, to make the error distributions more 
normal, and/or to make important interactions unimportant. Our earlier discussions of these 
topics apply completely to the three-factor case. 

Finally, our earlier discussion on the effects of departures from the ANOVA model 
applies fully to the three-factor case. In particular, the employment of equal sample sizes 
for all treatments minimizes the effect of unequal variances. 


The effects of gender of subject (factor A), body fat of subject (measured in percent, 
factor B), and smoking history of subject (factor C) on exercise tolerance (Y) were stud- 
ied in a small-scale investigation of persons 25 to 35 years old. Exercise tolerance was 
measured in minutes until fatigue occurs while the subject is performing on a bicycle 


1006 Part Five 


TABLE 24.5 
General 
ANOVA Table 
for Three- 
Factor Study 
with Fixed 
Factor Levels. 


Multi-Factor Stutlies 


Source of 
Variation SS df MS E{MS} 
2 
а: 
Factor А SSA a-1 MSA о? + "m 4 
2 
Еасїог В SSB b-1 MSB o?+ acne 
2 
Factor С SSC ex МС of + ав 
ову. 
AB interactions SSAB (a – 1)(b — 1) MSAB о? + cn oo 
(a — 1)(b— 1) 
ay)? 
AC interactions SSAC (a — 1y(c - 1) MSAC  o?4 "RPM 
(a — 1)(с — 1) 
Ву)? 
BC interactions SSBC (b—1)(c—1) МВС o*+a 22 XPYN 


"(b—1ye—1) 


27 2 (ор Vee 


ABC interactions SSABC (a—1)b—1)(c—1) MSABC o?-Fn-—— —————À——  — 
( X X ) с (a—1X6 -iXe-1) 

Error SSE abc(n — 1) MSE c? 

Total SSTO abcn — 1 


Note: p... on. Ву. Ук. Co). у). (Py) ix. апа (off y); are defined in (24.13). 


apparatus. Three subjects for each gender-body fat-smoking history group were given the 
exercise tolerance stress test. The results are recorded in Table 24.4a. Note that each fac- 
tor has two levels (а = b = c = 2) and that there are three replications (л = 3) for each 
treatment. 

The estimated treatment and factor level means are presented in Table 24.4b. Figure 24.5a 
contains the BC treatment means plots for each level of factor A, and Figure 24.5b contains 
the AB treatment means plots for each level of C. It appears that some factors may interact 
in their effect on exercise tolerance and that gender, in particular, may affect the endurance 
in stress testing. 


Residual Analysis. The researcher first prepared aligned residual dot plots for the eight 
treatments. These plots (not shown), though based on only three observations for each 
treatment, did not suggest any gross differences in the error variances for the eight treatments. 
The researcher also obtained a normal probability plot of the residuals, shown in Figure 24.6. 
The points in this plot form a moderately linear pattern. Normality of the error terms i$ 
supported by the high coefficient of correlation between the ordered residuals and their 
expected values under normality, namely, .969. The researcher was therefore satisfied that 
three-factor ANOVA model (24.14) is applicable here, and now wishes to analyze the nature 
of the factor effects in detail. 


“ast Example. 


FIGURE 24.6 
Normal 
Probability 
Plot of 
Residuals— 
Stress Test 
Example. 


Minutes 


Minutes 


Residual 


30 


20 


10 


30 


20 


10 


Chapter 24  Multi-Factor Studies 1007 


(a) Body Fat and Smoking History Plots 
А А; 
(Male) (Female) 


30 
G (Light smoking) 


$ 
2.20 C, (Light) 
C, (Heavy) > 
10 aaa 
C; (Heavy smoking) 
B, B В, B, 
(Low Fat) (High Fat) (Low Fat) (High Fat) 


(b) Gender and Body Fat Plots 


G G 
(Light smoking) (Heavy smoking) 
30 
B, (Low fat) 
"EC 8 20 В| (Low fat) 

E 
> Tees 

B, (High fat) 10 


B; (High fat) 


A A2 А A2 
(Male) (Female) (Male) (Female) 


Expected 


1008 PartFive  Multi-Factor Studies 


24.4 Analysis of Variance 


Partitioning of Total Sum of Squares 


Neglecting the factorial structure of the three-factor study and simply considering it to 
contain abc treatments, we obtain the usual breakdown of the total sum of squares; 


SSTO = SSTR + SSE (24.21) 


where: 


5570 = У у DM nus = Y. (24.213) 
i / k m 
SSTR n3 УУ. – Ӯ...) (24.210) 
i j k 


55Е = УУУ У Yim- Fin =) 22.2.6» (2421) 
i i k ud i j k m 


Consider now the estimated treatment mean deviation Үд. — Y...., which appears in 
SSTR. This can be decomposed in terms of the estimators in (24.19) of the main effects, 
two-factor interactions, and three-factor interaction: 


Үн. У = Ур... — Y.. + Vagus — Yous + Уш Р +. — Ур. У +... 

n Å — es NS 
Estimated A main effect B main effect C main effect AB interaction cffect 
treatment 


meun deviation 


+ Y jn. = Yra жез Y..4. + Үү... + Y. jx. ==. Y. j.. == ara + y... 


AC interaction effect BC intcraction effect 


T Үш. — Y;.. == Үк. = Y. jx. + Ү;... + Y.;.. + Ya = Y... 
ey 


ABC intcraction effect 


When we square each side and sum over i, /, k, and m, all cross-product terms drop out 
and we obtain: 


SSTR = SSA + SSB + SSC + SSAB + SSAC + SSBC + SSABC (24.22) 
where: 
SSA = nbe V Y. — Y. (24.222) 
SSB = nac Y Y... — Y. (24.22b) 
i 


SSC = nab X (Ёш. — Y... (24.220) 
[4 


Chapter 24  Multi-Factor Studies 1009 


: SSAB = nc Y ^ Y (Ӯ. Ў... Y. (24.22d) 
: i j 

A SSAC = nb V; Y. -Yn — Vig. + Y-y (24.22e) 
i k 

SSBC = na V X Oje — Y. — а. + Ў...) (24.22f) 
j k 

; SSABC =n Ў; 2: У. – Viz. — Yi — Ye + Yos + Y gu Ya — Y 
Mm (24.229) 


Combining (24.21) and (24.22), we have thus established the orthogonal decomposition: 
SSTO = SSA + SSB + SSC + SSAB + SSAC + SSBC + SSABC + SSE (24.23) 


SSA, SSB, and SSC are the usual main effects sums of squares. For instance, the larger 
(absolutely) are the estimated main В effects Y. jet Y...., the larger will be SSB. 

SSAB, SSAC, and SSBC are the usual two-factor interactions sums of squares. For in- 
stance, the larger (absolutely) are the estimated AB interactions Y; je — Yu pot Ү...., 
the larger will be SSAB. 

Finally, SSABC is the three-factor interactions sum of squares. The larger (absolutely) 
are these estimated three-factor interactions, the larger will be SSABC. 


Degrees of Freedom and Mean Squares | 
Table 24.5 contains the general ANOVA table for three-factor ANOVA model (24.14). The 
degrees of freedom for main effects and two-factor interactions sums of squares correspond 
to those for two-factor studies. The number of degrees of freedom associated with SSABC 
is obtained by subtraction and corresponds to the number of independent linear relations 
among all the interaction terms (oy үк. 

The expected mean squares are also given in Table 24.5. Note that MSA, MSB, MSC, 
MSAB, MSAC, МВС, and MSABC all have expectations equal to с? if there are no factor 
effects of the type reflected by the mean square. If such effects are present, each mean 
square has an expectation exceeding o?. As usual, E{MSE} = с? always. Hence, the tests 
for factor effects consist of comparing the appropriate mean square against MSE by means 
of an F* test statistic, with large values of F* indicating the presence of factor effects. 


Tests for Factor Effects 


The various tests for factor effects all follow the same pattern; we illustrate them with the 
test for three-factor interactions. The alternatives are: 
Ho: all (@By)ij, = 0 
ү d (24.242) 
Ha: not all (@By);;, equal zero 
The appropriate test statistic is: 


pa MSABC 
~ MSE 


(24.24b) 


1010 PartFive  Multi-Factor Studies 


TABLE 24.6 
Test Statistics 
for Three- 
Factor Study 
with Fixed 


Factor Levels. 


Alternatives Test Statistic Percentile 
„_ MSA f 
Ho: alla; = О F = MSE Е —o;a— 1, (п – 1)abc] 
Ha: not all a; = 0 
,.. MSB 
Ho: all Bj; = 0 F = MSE ЕП —o;b—1,(n— 1)abc] 
Ha: not all B; = 0 
.. MSC Р 
Ho: all y, = 0 F = MSE ЕП —o;c— 1, (n— 1)арс] 
Ha: not all y, = 0 
МАВ 
Ho: all (08); = О Рет F[1 — o; (а — 1)(6 — 1), (n — Тара 
Hg: not all (е8); = 0 
MSAC 
Ho: all (оу) = 0 Га F[1 — o; (a— 1)(c — 1), (n— 1)abc] 
Ha: not all (ay)ix = О 
MSBC 
Ho: all (Ву) к = 0 F= ЗЕ F[1 — a;(b—1)(c — 1), (n— 1) abc] 
Hg: not all (Ву) к = 0 
MSABC 
Ho: all («Ву = 0 Fe = MSE F[1 —o;(a— 1)(b — 1)(c — 1), (п – 1)abd 


Ha: not all (оВу)г = О 


If Hy holds, F* follows the F distribution with (a — 1)(b— I)(c — 1) degrees of freedom for 
the numerator and abc(n — 1) degrees of freedom for the denominator. Hence, the decision 
rule to control the Type I error at œ is: 
If F* < ЕП — o; (a — I) (6 — 1)(с— 1). (и 
If F* > F|] о; (а — (b — D(c — 1). (п — Dabc], conclude H, 


l)abc], conclude Ho 
(24.24c) 


Table 24.6 contains the test statistics and percentiles of the F distribution for the various 
tests in a three-factor study. 


Kimball Inequality. The Kimball inequality for the family level of significance o in a 
three-factor study when the family consists of the combined set of seven tests, including three 
on main effects, three on two-factor interactions, and one on three-factor interactions, is: 


a <l- (I-a) —@)---(1—a7) (24.25) 


where о; 15 the level of significance for the ith test. 


Comments 


I. If the three-factor interactions (and also perhaps some sets of two-factor interactions) equal 
zero, the question sometimes arises whether the corresponding sums of squares should be pooled with 
the error sum of squares. Our earlier discussion on revising the ANOVA model in Section 19.10 is 
applicable here also. 


Example 


FIGURE 24.7 
SYSTAT 
ANOVA 
Output—Stress 
Test Example. 


Chapter 24  Multi-Factor Studies 1011 


2. If there is only one case per treatment in a three-factor study with fixed factor levels, analysis 
of variance tests can only be conducted if it is possible to assume that some interactions equal zero. 
Usually, the interactions most likely to equal zero are the three-factor interactions. If it is possible to 
assume that all three-factor interactions equal zero, MSABC has expectation c? and plays the role of 
the error mean square MSE. All mean squares are calculated in the usual manner, except that n = 1. 

3. The F* test statistics in Table 24.6 can be obtained by the general linear test approach explained 
in Chapter 2. For example, for testing whether all three-factor interactions are zero, the full model is 
that in (24.14), the alternatives are those in (24.242), and the reduced model under Ho: (@By)ijx = Ois: 


Vijkm = Me. + 05 + Bj + ук + (yog + (оу) + (Ву) ж + Eijkm Reduced model (24.26) 
ш 
In the stress test example, the researcher first wished to test for the various factor effects. 
Figure 24.7 contains a portion of the SYSTAT ANOVA output. The researcher desired to 
conduct the seven potential tests with a family level of significance of a = .10. This will 
ensure that if in fact no factor effects are present, there will be only one chance in 10 for 
one or more of the seven tests to lead to the conclusion of the presence of factor effects. 
Using the Kimball inequality (24.25), the researcher solved the equation: 
«&—.10-51—(1—o;) 

and found о; = .015. Thus, use of significance level o; = .015 for each test ensures that 
the family level of sighificance will not exceed .10. 

The ANOVA table in Figure 24.7 shows the seven test statistics and their P-values. 
Each test statistic has in the numerator the appropriate factor effect mean square, and the 
denomipator of each test statistic is MSE. 


Test for Three-Factor Interactions. The first test was conducted for three-factor inter- 
actions. The alternatives are: 


Но: all (ay )ijx = 0 
Ha: not all (обу) equal zero 
The decision rule is: 
If F* < F(.985; 1, 16) = 7.42, conclude Но 
If F* > F(.985;1, 16) = 7.42, conclude H, 


ANALYSIS OF VARIANCE 

SOURCE SUM-OF-SQUARES DF MEAN-SQUARE F-RATIO P 
GENDER 176.584 1 176.584 18.915 0.000 
FAT 242.570 1 242.570 25.984 0.000 
SMOKING 70.384 1 70.384 7.539 0.014 
GENDER*FAT 13.650 1 13.650 1.462 0.244 
GENDER 

*SMOKING 11.070 1 11.070 1.186 0.292 
FAT*SMOKING 72.454 1 72.454 7.761 0.013 
GENDER*FAT 

*SMOKING 1.870 1 1.870 0.200 0.660 


ERROR 149.367 16 9.335 


1012 PartFive Multi-Factor Studies 


The F* test statistic obtained from Figure 24.7 is: 


MSABC _ 1.870 


к= = = 
MSE 9.335 


.20 


Since F* = .20 < 7.42, the researcher concluded that no ABC interactions are present, The 
P-value of this test is .66. 


Tests for Two-Factor Interactions. The researcher next tested for two-factor interactions, 
In the test for AB interactions, the decision rule is (the alternatives are given in Table 24 6: 


If F* x F(.985; 1, 16) = 7.42, conclude Hp 
If F* > F(.985;1, 16) = 7.42, conclude H, 


and the test statistic is: 


.  MSAB 13.650 
F* = —— = ——— = 1.46 
MSE 9.335 
Since F* — 1.46 x 7.42, the researcher concluded that no AB interactions are present. The 


P-value of this test is .24. 
The tests for AC and BC interactions proceeded similarly. We obtain: 


Ft = MSAC _ 11.070 
`7 MSE 9335 
Conclusion: Мо AC interactions are present. 
MSBC 72.454 
2. Е = —— = — = 776 F(.985; = 7.42 Р-уг =, 
Е М$Е 9.335 > F(.985;1, 16) = 7.4 value = .01 


Conclusion: Some BC interactions are present. 


Ш 


= 1.19 x F(.985; 1, 16) 7.42 P-value = ‚290 


Tests for Main Effects. Since factor A (gender) did not interact with the other two factors, 
attention next turned to testing for factor A main effects. In testing for factor A main effects, 
the decision rule is (the alternatives are given in Table 24.6): 

If F* < F(.985; 1, 16) = 7.42, conclude Hp 

If F* > F(.985; 1, 16) = 7.42, conclude Н, 


The test statistic is: 


T MSA 176584 _ ТРУ 
“MSE” 9.335 7 


Since F* = 18.92 > 7.42, the conclusion was reached that factor А main effects are present; 
specifically, we conclude that the mean endurance time for males is greater than that for 
females. The P-value of this test is 0+. 

The factor В and factor C main effects were not tested at this point because BC inter- 
actions were found to be present. The researcher first wished to study the nature of the BC 
interaction effects before determining whether the factor В and factor C main effects are 
of any practical interest under the circumstances. 


LE 


Chapter 24 Multi-Factor Studies 1013 


Family of Conclusions. The five separate F tests for factor effects led the researcher to 
conclude (with family level of significance < .10): 


1. There аге no three-factor interactions. 

2. There are no two-factor interactions between gender (factor A) and either of the other 
two factors—body fat (factor B) and smoking history (factor C). Body fat and smoking 
history interactions do exist, however. 

3. Main effects for gender (factor A) are present—mean endurance time for males is larger 
than for females. 


This set of test results was most useful to the researcher. The next step in the analysis was 
to examine the nature of the BC interaction effects. 


94.5 Analysis of Factor Effects , 


£ 


No new problems are encountered in the analysis of factor effects for three-factor studies 
with fixed factor levels. As for two-factor studies, the focus of the analysis is usually on 
factor level means when no important interactions are present, and on various two-factor 
level means (kij-, г-к, От и. jk) OF individual cell means (u:jx) when there are important 
interactions. We first present a formal strategy for determining which level of analysis is 
appropriate. We then present some selected results for estimating factor effects. 


Strategy for Analysis 


As described in Section 19.7 for two-factor studies, the presence of interacting effects in 
multifactor studies complicates the explanation of the factor effects because they must 
then be described in terms of the combined effects of multiple factors. Of course, some 
phenomena are too complex to be described simply by additive main effects. The desire for 
a simple, parsimonious explanation, when possible, suggests the following basic strategy 
for analyzing factor effects in three-factor studies: 


1. Examine whether or not important three-factor interactions exist. 

2. If no important three-factor interactions exist, determine whether or not important two- 
factor interactions are present. 

3. If no important two-factor or higher-order interactions are present, examine the main 
effects. For important А, B, or C main effects, describe the nature of these effects in 
terms of the factor level means p;.., д.у. and д... respectively. 

4. Ifthree-factor interactions are important, consider whetherthey can be made unimportant 
by a meaningful simple transformation of scale. If so, make the transformation and 
proceed as in step 2. 

5. For important three-factor interactions that cannot be made unimportant by a simple 
transformation, which is often the case, analyze the three factors jointly in terms of the 
treatment means и; р. 

6. If there is just one important two-factor interaction, analyze the effects jointly in terms 
of the appropriate two-factor treatment means ju;;., и.к, OF и. зк. Analyze the effects of 
the third factor separately. For example, if the AB interaction is present and no AC or BC 
interactions exist, analyze the marginal means ш;.. If a C main effect is present, analyze 
the single-factor level means џи..; separately. 


1014 Part Five Multi-Factor Studies 
7. If there are two or three important two-factor interactions in a three-factor study, ana} 

the three factors jointly in terms of the treatment means jg. This principle exten P d 
multifactor studies having more than three factors in the following way. If any two ч 
factor interactions are overlapping—that is they each involve а common factor then 
the cell means should be analyzed in terms of the joint effects of the three factors, For 
example, if in a four-factor study two interactions AB and ВС are found to be important 
(and no higher-order interactions are present), analysis of the three-factor leve] means 
ук. is indicated. 


Occasionally, exceptions to the strategy outlined above may arise. For example, on page 826 
we commented on a situation in which an investigator might be interested in inferences 
concerning a main factor effect even though the factor was also present in an important 
two-factor interaction. 

We have already discussed the testing for interaction effects, the possible diminution of 
important interactions by a simple transformation, and how to test for the presence of factor 
main effects. Now we turn to steps 2 through 7 of the strategy for analysis, namely, how 
to compare single-factor level means 44.., и. ;. and p. when there are unimportant three- 
factor and two-factor interactions, how to compare two-factor level means 44;;., и.к, and m 
when there is a single important two-factor interaction, and finally, how to compare treatment 
means и; when there are important overlapping two factor interactions or important three- 
factor interactions. 


Analysis of Factor Effects when Factors Do Not Interact 
Estimation of Factor Level Mean. The factor A level mean џи. is estimated by: 


fu. = Yj... (24.27) 
The estimated variance of this estimator is: 
zs MSE 
$4 Y...) = (24.28) 
nbc 


Confidence limits for џ;.. are obtained by means of the 7 distribution with (n — Dabe 
degrees of freedom: 


Y;... E t[1 — 0/2; (п — Dabc]s(Y;...] (24.29) 
Estimation of factor level means for factors В or C is done in similar fashion. 


Inferences for Contrast of Factor Level Means. Inference procedures for a contrast 
involving the factor A level means џ;..: 


L= у сш. where Soa =0 (24.30) 
are easily developed. The | — о confidence limits for L are: 
Lx t[1— 0/2; (n — Dabc]st£] (24.31) 
where L is estimated unbiasedly by: 


L = усу. (24.31a) 


FLUTE 


Chapter 24  Multi-Factor Studies 1015 


and the estimated variance of Ê is: 
2 MSE 
21h} = 2 24.31b 
S) = у (24.31b) 
Contrasts of factor level means for factors B or C are estimated in similar fashion. 
The test statistic and decision rule for the following alternatives concerning a contrast L 
in (24.30): 


Нә: L =0 
(24.32) 
На: L #0 
are: 
L 
t* = ——; lf |t*| > t[1 — 0/2; (n — Dabc], conclude Н, (24.33) 


50 
where Ê and s{Î} are given by (24.31). Again for conciseness, we present only the portion 
of the decision rule leading to conclusion H,. 


Multiple Contrasts of Factor Level Means. When inferences are to be made concerning 
a number of contrasts of factor A level means jj;.., the Tukey, Scheffé, and Bonferroni 
procedures are easily adapted. As before, the Tukey procedure applies to the set of all 
pairwise comparisons of the form D = ш.. — ur... 

To obtain simultaneous confidence interval estimates, the t multiple in (24.31) isreplaced 
by the T, S, or B multiple defined as follows: 


Procedure Multiple 

Tukey T= 2 — о; а, (n — 1)abc] (24.34a) 
Scheffé 52 = (a— 1)F[1 —a; a—1, (n— 1)abc] (24.34b) 
Bonferroni B = 1 — o/2g; (n — 1)abc] (24.34с) 


Test statistics and decision rules for simultaneous testing of a number of contrasts of the 
form (24.30) for the alternatives Ho: L = 0, На: L zz О аге: 


Procedure Test Statistic and Decision Rule 
J/2b 
Tuk * = 24.35a 
ukey T — Xy ( ) 
If |g*| > 91 — o; a, (n — 1)abc], conclude Ha 
[? 
Scheffé F* = -———— 24.35b 
ш ат Ка 
IE F* > Fl1—o;a— 1, (п – 1)аЬс], conclude Ha 
Bonferroni Р = 1 (24.35c) 
s{Î} 


if |] > #1 —o/2g; (n — 1)abc], conclude Ha 


1016 PartFive — Multi-Factor Studies 


Inferences concerning multiple contrasts based on the factor level me 
made in corresponding fashion. 


ANS M.j. OF LL. are 
Analysis of Factor Effects with Multiple Two-Factor Interactions 
or Three-Factor Interaction 


As explained earlier in the strategy for analysis, when a three-factor interaction is present 
or overlapping two-factor interactions are present, the results of the study are typically 
analyzed in terms of the treatment means и. 


Estimation of Treatment Mean. The treatment mean и; is estimated by: 


fj = Үк. (24.36) 
The estimated variance of Y;;,. is: 
5.) = 2 (24.37) 
n 
Confidence limits for мен are: 
ш. E t[1 — 0/2; (n — Dabc]stY;n.] (24.38) 


Inferences for Contrast of Treatment Means. When important interactions are present, 
contrasts among the treatment means муу are ordinarily desired. Let, as usual, L denote 
such a contrast: 


L= 5 M » CijkHijk where У У У? Cijk = 0 (24.39) 
Confidence limits for L are: 
Ê +11 — 0/2; (n — Dabc]s(£.) (24.40) 


where: 


і = У) у айд. (24.402) 
а= ty У) (24400) 


п 


The test statistic and decision rule for alternatives Ме: L = 0, Hy: L zz О are: 


nam Y If |t*| > t[1 0/2; (n — I)abc], conclude H, (24.41) 
5 


Analysis of Factor Effects with Single Two-Factor Interaction 
When a single two-factor interaction is present in a three-factor study, desired contrasts 
may involve means of the и; taken over one of the factors. For example, when the 
only interactions present are the BC interactions, there may be interest in contrasts of the 


Chapter 24  Multi-Factor Studies 1017 


means и. jk: . 


L= 55 » C jk jk where 3 У Cj = 0 (24.42) 


Such contrasts are, of course, special cases of contrasts of the treatment means шук in 
(24.39). The estimator of the contrast in (24.42) can be obtained from (24.402) and the 
estimated variance from (24.40b); they are: 


L= DR y сҮ. (24.43) 
54} = ИВЕ 93:7 (24.44) 


na 


Multiple Contrasts of Treatment Means. For simultaneous interval estimates of con- 
trasts of treatment means џи; зр, the t multiple in (24.40) is replaced by the T, S, or B multiple 
defined as follows: 


Procedure Multiple 

Tukey re 25 — о; АВС, (n— 1)аһс] (24.45a) 
Scheffé 52 = (abc— 1)F[1 — o; abc— 1, (n— 1)abc] (24.45b) 
Bonferroni B = t[1—o/2g; (n— 1)арс] (24.45c) 


Simultaneous testing of a number of alternatives of the form Ho: L = 0, Ha: L Æ 0 using 
the Tukey, Scheffé, and Bonferroni procedures can be accomplished with the following test 
statistics and decision rules: 


Procedure Test Statistic and Decision Rule 
J/2b 
Tuk * _. 24.46a 
ukey d = xb) ( ) 
If |g*| > g[1 — o; АВС, (n— 1)abc], conclude Ha 
f2 
Scheffé Fe = — 24.46b 
ii (abc— 1)52{Ї} ( ) 
If F* > ЕП —a; abc— 1, (n— 1)abc], conclude Н, 
Bonferroni — t*— EN (24.46c) 
st) 


If || > #1 —o/2g; (n— 1)abc], conclude Ha 


As before, the Tukey procedure concerns only pairwise comparisons. 


1018 Раг туе  Multi-Factor Studies 


Example—Estimation of Contrasts of Treatment Means 
To study the nature of the BC interaction effects in the stress test example, the rege 
wished to estimate separately, for persons with high and low body fat, the differe 
mean fatigue time for light smokers and heavy smokers. The desired contrasts are: 


archer 
nce in 


Li = шаи — Hn 
15 = иэ — B 


In addition, a single comparison between the factor level means for factor A is sufficient 6 
analyze the factor A main effects since factor A has only two levels. The contrast of interest 
(here a pairwise comparison of factor level means) is: 


Li = Hi. — Ha.. 
These three contrasts are estimated as follows, using the results in Table 24,4: 


1. — Улэ. = 22.90 — 16.00 = 6.90 
ээ. = 13.07 — 13.12 = —.05 


The researcher obtained the estimated variances by using (24.44) and (24.31b) and the 
Bonferroni multiple for a 95 percent family confidence coefficient: 


2.7 2.7 MSE , .33 

Abie tis ay eps usas 
na 6 

MSE, ‚_ 9335 _ 

[OY + 1 = S73 = 1.556 


s{Z,} = s(£4) = 1.764 s(£4] = 1247 
B = t(1 — .05/6; 16) = 2.673 


523} = 


The desired confidence intervals using (24.40) therefore are: 


^ 


2.2 = 6.90 —2.673(1.764) < pry — шо < 6.90 + 2.673(1.764) = 11.6 
4.8 = —.05 — 2.673(1.764) € шо — иә» < —.05 + 2.673(1.764) = 4.7 
2.1 = 542 — 2.673(1.247) € mı. — u2. < 542 + 2.673(1.247) = 8.8 


The researcher therefore concluded with family confidence coefficient .95: (4) Among 
people with low body fat, those who have a light smoking history have a mean stress test 
endurance that is 2.2 to 1 1.6 minutes longer than the mean endurance for people with a heavy 
smoking history. (2) People with high body fat do not differ in mean stress test endurance 
whether they have a light or a heavy smoking history. (3) The mean stress test endurance 
for men is 2.1 to 8.8 minutes longer than the mean endurance for women. 

In view of the important interaction effects between body fat and smoking history on 
stress test endurance noted in the study findings, the researcher concluded that factor В 
and factor C main effects are of no interest, and therefore terminated the analysis at this 


* 


Chapter 24  Multi-Factor Studies 1019 


QURE 24.8 Key Findings from Stress Test Endurance Study. 


(a) Effect of Gender: (b) Effects of Body Fat and Smoking History 
Y. 
4 30 
Low percent fat 
` 10 \ \20 y,. Endurance 2 20 
i i (Minutes) Ss 
: Females Males 
: 10 High percent fat 


246 Une 


Light Heavy 
Smoking History 


point. The principal findings are presented graphically in Figure 24.8. Figure 24.8a shows 
the magnitude of the effect of gender on stress test endurance, and Figure 24.8b shows 
the nature of the interaction effects between body fat and smoking history on stress test 
endurance. 


al Sample Sizes in Multi-Factor Studies 


When the treatment sample sizes in a multi-factor study are not equal, the procedures 
explained in Sections 23.1—23.3 for two-factor studies with unequal treatment sample sizes 
should be followed with routine modifications. We continue to assume that all treatment 
means are of equal importance and that there are no empty cells. 


Tests for Factor Effects 


Example 


Tests for factor effects in multifactor studies with unequal sample sizes can be conducted 
by means of the regression approach. Indicator variables taking on the values 1, —1, 0, are 
designated for each factor, the number of such variables for each factor being one less than 
the number of factor levels. Interaction effects are represented by cross-product terms, as 
usual. Since the sums of squares are no longer orthogonal when the treatment sample sizes 
are unequal, different reduced models need to be fitted for the tests of interest. 


Suppose that in the stress test example of Table 24.4, observations Үз and Үэ were 
missing. To develop a regression model for this example, we note that each of the three 
factors is at two levels. Hence, one indicator variable is required for each factor. The full 
regression model therefore is: 


Үп = и... + 03 Xii + В. Хіт + у Ханз + (OB) Xijan Xijkm2 


+ (ry 11 Xijou Xijkm3 + (BY) 11 Xijkm2Xijkm3 
+ (BY) 111 Xi Xijkm2Xijkm3 + Eijkm Full model (24.47) 


1020 Part Five 


TABLE 24.7 
Data for 
Regression 
Model 
(24.47)—Stress 
Test Example 
with Уз and 
Y2212 Missing. 


Mutti- Factor Studies 


үн BD з (40 (5 (6) (7) (8 3 


j k m X1 X2 X3 Xi X2 Ху Хз X2X3 X XX. 
1 1 1 1 241 1 1 1 1 1 1 i 
1 1 1 2 29.2 1 1 1 1 1 1 1 
1 1 2 1 17.6 1 1 —1 1 —1 —1 1 
22 1 1 161 ee 1 1 d. Ej E 
2 2 1 3 10.8 —1 —1 1 1 —1 —1 1 
2 2 2 1 10.1 —1 —1 —1 1 1 1 -1 
2 2 2 2 14.4 —1 —1 —1 1 1 1 21 
2 2 2 3 6.1 —1 —1 —1 1 1 1 -1 
where: 
x= ] if case from level 1 for factor A 
1 | —1 if case from level 2 for factor А 


T 1 if case from level | for factor B 
27-1 ifcase from level 2 for factor В 


X= | if case from level | for factor C 
35 71-1 ifcase from level 2 for factor C 


The regression parameters in model (24.47) are the ANOVA model parameters as defined 
in (24.13). 

Table 24.7 repeats in column | a portion of the Y observations for the stress test example 
in Table 24.4 with observations Y;;;3 and Уо missing. The coded indicator variables Х|, 
X2, and Хз are shown in columns 2—4 and the cross-product interaction terms are shown in 
columns 5-8. The full model in (24.47) is fitted by regressing Y in column 1 of Table 24.7 
on the X variables in columns 2—8. To test a particular factor effect, the reduced model is 
obtained by dropping the appropriate X variable(s). For instance, to test for factor A main 
effects, Ху would be dropped to obtain the reduced model апа Y would be regressed on the 
X variables in columns 3-8. 


Comment 


The discussion in Section 23.6 on the use of statistical packages for analysis of variance with unequal 
sample sizes and/or empty cells is applicable in its entirety for multifactor studies. -E 


Inferences for Contrasts of Factor Level Means 


Estimation and testing of contrasts of factor level means in multi-factor studies with unequal 
sample sizes are conducted in similar fashion as for two-factor studies. The formulas in 
Table 23.5 for the development of interval estimates need simply be extended to three or 
more factors. Testing procedures may be devised from these extensions in the usual fashion. 

To illustrate such an extension, consider pairwise comparisons of factor A level means 
in a three-factor study with unequal samples sizes. Extending formula (23.21), we obtain 


4 Chapter 24  Multi-Factor Studies 1021 


for the comparison, its estimator, and the estimated variance: 
D = Hie — Hie (24.48a) 
5 А 20 0 232a Ye 

bc 


D = fie — фр. where Îi- 


s{D} = o M NS (= + =) (24.48с) 
j k 


Rijk П 


(24.48b) 


The appropriate degrees of freedom associated with MSE are nr — abc. 


` 


947 Planning of Sample Sizes 


We considered the planning of sample sizes for single-factor studies with power approach 
and estimation approach in Chapters 16 and 17. Then we considered the planning of sample 
sizes for two-factor studies in Chapter 19. Now we take up the planning of samples sizes 
for multi-factor studies. 


Power of F Test for Multi-Factor Studies 


Table B.11 can be used for determining the power of tests for multi-factor studies in the 
same fashion as for single-factor and two-factor studies. The only differences arise in 
the definition of the noncentrality parameter and the degrees of freedom. For three-factor 
fixed effects ANOVA model (24.14) with equal treatment sample sizes, the noncentrality 
parameter for a given test is defined as follows: 


numerator of second term in E{MS} in Table 24.5 1/2 
denominator of second term in Ё{М5} plus 1 


$-l 


(24.49) 
[03 


For example, for testing for three-factor interactions, we have: 


1 | DDDIC 


1/2 


a с l|(a—l(b—-he—-l +1 


Use of Table B.12 for Multi-Factor Studies 


When planning sample sizes for three-factor studies with the power approach, one is typi- 
cally concerned with the power of detecting factor A main effects, the power of detecting 
factor B main effects, and the power of detecting factor C main effects. One can first specify 
the minimum range of factor A level means for which it is important to detect factor A main 
effects and obtain the needed sample sizes from Table B.12, with r = a. The resulting 
sample size is ben, from which n can be obtained readily. The use of Table B.12 for this 
purpose is appropriate provided the resulting sample sizes are not small, specifically pro- 
vided a(bcn — 1) > 20. If this condition is not met, the ANOVA power tables in Table B.11 
should be used with an iterative approach. 

In the same way, the values for the minimum range of factor level means for factors B 
and C can be specified for which it is important to detect the factor main effects, and the 
needed sample sizes found. If the sample sizes obtained from the factor A, factor B, and 


1022 Part Five 


Multi-Factor Studies 


factor 


final sample sizes. 


г C power specifications differ substantially, a judgment will need to be m 


ade as to the 


Cited 


Reference 


24.1. 


Monlezun. C. J. “Two-Dimensional Plots for Interpreting Interactions in the Three- 


F; 
Analysis of Variance Model.” The American Statistician 33 (1989), pp. 63-69. syor 


Problems 


24.1. 


Refer to Table 24.1 containing the mean responses муд for a three-factor study. 
a. Find the main effects of age. 
b. Find the interaction effect of young age and normal 1Q. 


c. Find the interaction effect of young age, normal IQ, and female gender. 


. Prepare AC plots of the mean responses и; in Table 24.1 in the format of Figures 24.2c-e. 


Do your plots convey the same information as Figure 24.1? Discuss. 


. Prepare BC plots of the mean responses д; in Table 24.1. Do your plots bring out any 


information on main effects and interactions not readily seen from Figure 24.1? Discuss, 


. Ina three-factor study, the mean responses jj, are as follows: 


a. Find o, оъ, and аз. 

b. Find f£» and yı. 

c. Find (of)i», (оу), and (By) р. 
d. Find (ову): and (оВу)з. 


. Refer to Problem 24.4. Prepare A B plots of the mean responses и; in the format of Figure 24. 1. 


What do these plots show about factor main effects and interactions? 


Case hardening. An experiment involving the case hardening of lightweight shafts machined 
from bars of an alloy was run to study the effects of the amount of a chemical agent added to 
the ‘alloy in a molten state (factor A), the temperature of the hardening process (factor B), and 
the time duration of the hardening process (factor C) on the outside hardness of the shaft. All 
factors were at two levels (1: low, 2: high), and the number of rods tested for each treatment 
was n = 3. The data on hardness (in Brinell units) follow. 


* 


k=1 k=2 
j=1 j=2 j=1 j=2 
i=] 39.9 53.5 56.0 70.9 
32.2 50.7 56.9 733 
36.3 52.8 56.6 71.6 
i=2 45.2 63.3 69.4 82.9 
48.0 65.5 66.6 85.2 


47.5 63.6 68.8 82.3 


s Chapter 24 Multi-Factor Studies 1023 


a. Obtain the residuals for ANOVA model (24.14) and prepare aligned residual dot plots for 
each level of factor A. Do the same for each of the other two factors. What information do 
these plots provide about the appropriateness of ANOVA model (24.14)? 


b. Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla- 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 

Refer to Case hardening Problem 24.6. Assume that fixed ANOVA model (24.14) is 

appropriate. 

a. Prepare AB plots of the estimated treatment means Y; jk- in the format of Figure 24.5b. 
Does it appear that any interactions are present? Any main effects? 

b. Obtain the analysis of variance table. 


c. Test for three-factor interactions; use œ = .025. State the alternatives, decision rule, and 
conclusion. What is the P-value of the test? 


d. Test for AB, AC, and BC interactions. For each test, use о = .025 and state the alternatives, 
decision rule, and conclusion. What is the P-value of each test? 


e. Test for A, B, and C main effects. For each test, use œ = .025 and state the alternatives, 
decision rule, and conclusion. What is the P-value of each test? 


f. State the set of conclusions that can be reached from the tests in parts (c), (d), and (e). 
Obtain an upper bound for the family level of significance for the set of tests; use the 
Kimball inequality (24.25). 

5. Do the results in part (f) confirm your graphic analysis in part (a)? 

Refer to Case hardening Problems 24.6 and 24.7. 

а. To study the nature of the main factor effects, estimate the following pairwise comparisons: 


Di = Ma. — by D» = шэ. — ил. Рз = нэ — Hh 


Use the Bonferroni procedure with a 95 percent family confidence coefficient. State your 
findings. 
b. Estimate 427; with a 95 percent confidence interval. 
Marketing research contractors. A marketing research consultant evaluated the effects of 
fee schedule (factor A), scope of work (factor B), and type of supervisory control (factor C) 
on the quality of work performed under contract by independent marketing research agencies. 
The factor levels in the study were as follows: 


Factor Factor Levels 

A Feelevel і= 1: High 
і= 2: Ахегаде 
і= 3: Low 

B Scope ј = 1: All contract work performed in house 

р ј = 2: Some work subcontracted out 

C Supervision К = 1: Local supervisors 

К = 2: Traveling supervisors only 


The quality of work performed was measured by an index taking into account several char- 
acteristics of quality. Four agencies were chosen for each factor level combination and the 
quality of their work evaluated. The data on quality follow. 


—— an 


E pour 


1024 Part Five 


Multi-Factor Studies 


24.10. 


24.11. 


k=1 k—2 

E j- je! — j-2 
i=1 124.3 115.1 112.7 88.2 

122.6 11 73 1 08.6 901 
ted 119.3 117.2 113.6 92.7 

1214 1 20.0 1 1 23 87.9 
і= 3 90.9 89.9 78.6 58.6 

92.0 827 771 62.3 


Obtain the residuals for ANOVA model (24.14) and plot them against the fitted values. 
What does your plot suggest about the appropriateness of ANOVA model (24.14)? 
Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla- 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 


Refer to Marketing research contractors Problem 24.9. Assume that fixed ANOVA model 
(24.14) is appropriate. 


a. 


о 


h. 


Prepare AB plots of the estimated treatment means Yu in the format of Figure 24.5b, 
Does it appear that any interactions are present? Any main effects? 


Prepare AC plots of the estimated treatment means Y; д. in the format of Figure 24.5b, Do 
р р j g 
your plots convey the same information as those in part (a)? Discuss. 


Obtain the analysis of variance table. 

Test for three-factor interactions: use œ = .01. State the alternatives, decision rule, and 
conclusion. What is the P-value of the test? 

Test for A B. AC. and BC interactions. For each test, use œ = .01 and state the alternatives, 
decision rule. and conclusion. What is the P-value of each test? 

Test for factor A main effects; use о = .01. State the alternatives, decision rule, and cor- 
clusion. What is the P-value of the test? 

State the set of conclusions that can be reached from the tests in parts (d). (e), and (f). 
Obtain an upper bound for the family level of significance for the set of tests; use the 
Kimball inequality (24.25). 

Do the results in part (g) confirm your graphic analysis in parts (a) and (b)? 


Refer to Marketing research contractors Problems 24.9 and 24.10. 


a. 


b. 


To study the nature of the factor A main effects and the BC interactions. it 15 desired to 
estimate the following comparisons: 


Di = gi. = Ha. Dy = pai — Маз 
Dy = из. — H3- Ds = u.a = ina 
Di = Mj. — Ma. Li = Dı == Ds 


Use the Bonferroni procedure with a 90 percent family confidence coefficient to make the 
desired comparisons. State your findings. 


Estimate D = рузу — Ha with a 95 percent confidence interval. 


Chapter 24  Multi-Factor Studies 1025 


c. The consultant wishes to identify the type(s) of independent marketing research agencies 
that provide the highest quality of work. Use the Tukey testing procedure with family level 
of significance œ = .10 to make the desired identifications. 


Electronics assembly. Assemblers in an electronics firm will attach 12 components to a newly 
developed “board” that will be used in automatic-control equipment in manufacturing plants. 
An operations analyst conducted an experiment to study the effects of three factors on the 
mean time to assemble a board. Factor A was the gender of the assembler (i = 1: male; 
i = 2: female), factor B was the sequence of assembling the components (j = 1, 2, 3), and 
factor C was the amount of experience by the assembler (k = 1: under 18 months; k = 2: 
18 months,or more). Randomization was used to assign 15 assemblers of each gender with 
a given amount of experience to each of the three assembly sequences, with each sequence 
assigned to five assemblers. After a learning period, the total time (in minutes) to assemble 
50 boards was observed. The data follow. 


= 1 ј= 2 = 3 j=l j=2 j=3 
i21 1,250 1,319 1,217 1,021 1,119 1,033 
1,175 1,251 1,190 1,099 1,110 1,067 


1,193 1,265 1,251 1,070 1,163 1,022 


i22 1,066 1,105 1,021 864 927 841 
1,076 1,043 1,020 848 944 865 
1,034 1,060 1,026 868 933 868 


d. Obtain the residuals for ANOVA model (24.14) and plot them against the fitted values. 
What does your plot suggest about the appropriateness of ANOVA model (24.14)? 

e. Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla- 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear to be reasonable here? 

Refer to Electronics assembly Problem 24.12. Assume that fixed ANOVA model (24.14) is 

appropriate. 

а. Prepare AB plots of the estimated treatment means Y;;,. in the format of Figure 24.5b. 
Does it appear that any interactions are present? Any main effects? 

b. Obtain the analysis of variance table. 

c. Test for three-factor interactions; use œ = .05. State the alternatives, decision rule, and 
conclusion. What is the P-value of the test? 

d. ‘Test for AB, AC, and BC interactions. For each test, use œ = .05 and state the alternatives, 
decision rule, and conclusion. What is the P-value of each test? 

e. Test for A, B, and C main effects. For each test, use о = .05 and state the alternatives, 
decision rule, and conclusion. What is the P-value of each test? 

f. State the set of conclusions that can be reached from the tests in parts (c), (d), and (e). 
Obtain an upper bound for the family level of significance for the set of tests; use the 
Kimball inequality (24.25). 

g. Do the results in part (f) confirm your graphic analysis in part (a)? 


ve 


1026 Part Five  Multi-Facior Studies $ 


24.14. 


*24.15. 


24.16. 


*24.17. 


24.18. 


Refer to Electronics assembly Problems 24.12 and 24.13. 


a. To study the nature of the factor main effects, estimate the following pairwise compar; 


Di = ці. = pa. Dy = рә. na 
Di = иа. {4з Ds = Bot cH. 
Di = u.a. = pas. 


Use the Bonferroni procedure with a 90 percent family confidence coefficient State y 
З * Our 
findings. 


b. Estimate p23; with a 95 percent confidence interval. 


Refer to Case hardening Problem 24.6. Suppose that observations Yiz = 53.5 and 
Yisjo = 50.7 are missing. 


a. State the full regression model equivalent to ANOVA model (24.14): use 1, —1.0 indicator 
variables, 

b. What is the reduced regression model for testing for factor A main elfects? 

c. Test whether or not factor A main eflects аге present by fitting the full and reduced Tegres- 
sion models; use a = .025. State the alternatives. decision rule. and conclusion, What js 
the P-value of the test? 


d. Estimate D = u2.. — ti... with a 95 percent confidence interval, 

Refer to Electronics assembly Problem 24.12. Suppose that observations Yi», = 1,097, 

Y»3 = 1.051. and Y2)25 = 868 are missing. 

a. Slate the full regression model equivalent to ANOVA model (24.14): use 1, — 1, O indicator 
variables. 


b. What is the reduced regression model for testing for factor C main effects? 

c. Test whether or nol factor C main effects are present by fitting the full and reduced 
regression models; use о = .05. State the alternatives, decision rule. and conclusion. What 
is the P-value of the test? 

d. Estimate D = н. — и..з with a 95 percent confidence interval, 


Refer to Case hardening Problem 24.6. Suppose that the sample sizes have not yet been 
determined but it has been decided to use equal sample sizes for all treatments. The chief 
objective is to identify the treatment that leads to the highest mean hardness. The probability 
should be at least .99 that the correct ueatment is identified when the mean hardness for the 
second best treatment differs by 2.0 or more Brinell units. Assume that a reasonable planning 
value for the error standard deviation is с = 1,8. What are the required sample sizes? 
Refer to Electronics assembly Problem 24.12. Suppose that the sample sizes have not yet 
been determined but it has been decided to use equal sample sizes for all treatments, The chief 
objective is to estimate the following pairwise comparisons: 


Li = Hie ga Ly = рә. = Bae 


э = ja Bex Ls = Haa = n. 


L3 = ba = We 


What are the required sample sizes if the precision of each of the estimates should not exceed 
+20, using the Bonferroni procedure with a 90 percent family confidence coefficient for the 
joint set of comparisons? A reasonable planning value for the error standard deviation 1$ 
о = 29. 


24.19. 
24.20. 


24.21. 


Chapter 24  Multi-Factor Studies 1027 


For fixed ANOVA model (24.14), show that ` (v узик = 0. 

State the fixed ANOVA model for a three-factor study with п = 1 when all three-factor 
interactions are zero. Show the ANOVA table for this case. 

For fixed ANOVA model (24.14), derive the variance of the estimated contrast 


L DDIN 4 


24.22. 


24.23. 


24.24. 


Refer to the SENIC data set in Appendix C.1. The following hospitals are to be considered 
in a study of the effects of average age of patients (factor A: variable 3), available facilities 
and services (factor B: variable 12), and region (factor C: variable 9) on the mean length of 
hospital stay of patients (variable 2): 


1-14 1628 31 32 34 35 37-39 41 44 46 50 
52 53 57 58 63 66 76 77 83 111 


For purposes of this ANOVA study, average age is to be classified into two categories (less 

than 53.0 years, 53.0 years or more) and available facilities and services are to be classified 

into two categories (less than 40.2 percent, 40.2 percent or more). 

a. Assemble the required data and obtain the residuals for ANOVA model (24.14). 

b. Plot the residuals against the fitted values. What does your plot suggest about the appro- 
priateness of ANOVA model (24.14)? 

c. Prepare a normal probability plot of the residuals. Also obtain the coefficient of correla- 
tion between the ordered residuals and their expected values under normality. Does the 
normality assumption appear reasonable here? 


Refer to the SENIC data set in Appendix C.1 and Project 24.22. Assume that fixed ANOVA 

model (24.14) is appropriate. 

a. Prepare AB interaction plots of the estimated treatment means У. in the format of Fig- 
ure 24.5b. Does it appear that any factor effects are present? Explain. 

b. Obtain the analysis of variance table. Does any one source account for most of the total 
variability in the study? Explain. 

c. Test for three-factor interactions; use œ = .01. State the alternatives, decision rule, and 
conclusion. What is the P-value of the test? 

d. Test for AB, AC, and BC interactions. For each test, use œ = .01 and state the alternatives, 
decision rule, and conclusion. What is the P-value of each test? 

e. Test for A, B, and C main effects. For each test, use о = .01 and state the alternatives, 
decision rule, and conclusion. What is the P-value of each test? 

f. То study the nature of the available facilities and region main effects, make all pairwise 
comparisons for each of these two factors. Use the Bonferroni procedure with a 90 percent 
family confidence coefficient. State your findings. 


Refertothe CDI data set in Appendix C.2. The effects ofregion (factor A: variable 17), percent 
below poverty level (factor B: variable 13), and percent of population 65 or older (factor C: 
variable 7) on the crime rate (variable 10 = variable 5) are to be studied. For purposes of this 
ANOVA study, percent below poverty level is to be classified into two categories (less than 
8.0 percent, 8.0 percent or more) and percent of population 65 or older is to be classified into 
two categories (less than 12.0 percent, 12.0 percent or more). 


1028 Part Five 


Multi-Factor Studies $ 


. Refer to the CDI data set in Appendix C.2 and Project 24.24. Assume th 


а. Assemble the required data and obtain the residuals for ANOVA mode] (24.14) wi 
т = l.....nig. ue? with 

b. Plot the residuals against the fitted values. What does your plot sug: 
priateness of ANOVA model (24.14)? 


c. Prepare a normal probability plot of the residuals. Also obtain the coefficient of correlati 
: 2 а 
between the ordered residuals and their expected values under normality, Does the norm, oer 
assumplion appear reasonable here’? ty 


gest about the appro- 


at fixed AN 
model (24.14) with ar = 1, .... jj, is appropriate. OVA 


a. Prepare AB interaction plots of the estimated treatment means Yo in the format of Fig. 

ure 24.5b. Does it appear that any factor effects are present? 

b. State the equivalent regression model for this case; use 1, — 1. 0 indicator variables, and fit 
this full model. 

c. Test for three-factor interactions and for AB. AC, and BC interactions, For each test, use 
а = .025 and state the alternatives, reduced regression model, decision rule, and conclu- 
sion. What is the P-value of each test? 

d. Test for A, B, and C main effects. For each test, use œ = .025 and state the altematives, 
reduced regression model, decision rule, and conclusion, What is the P-value of each test? 

€. То study the nature of the region main effects, make all pairwise comparisons between the 
region means. Use the Tukey procedure with a 95 percent family confidence coefficient, 
State your findings. 


Case 
Studies 


24.26. 


24.27. 


Refer to the Real estate sales data set in Appendix C.7. Assume that the sample sizes do noi 
reflect the importance of the treatment means. Carry out an unbalanced three-way analysis 
of variance of this data set, where the response of interest is sales price (variable 2), and the 
three crossed factors are quality (variable 10), style (variable 11), and number of bedrooms 
(variable 4). Recode quality into two categories: 1-2, and 3. Recode the number of bedrooms 
into three categories: 0-2, 3, and 4 or тоге, Recode style as either 1 or not 1. The analysis 
should consider transformations of the response variable. Document the steps taken in your 
analysis and justify your conclusions. 


Refer to the Real estate sales data set in Appendix C.7 and Case Study 24.26. Assume that the 
sample sizes reflect the importance of the treatment means. Carry out an unbalanced three-way 
analysis of variance of this data set, where the response of interest is sales price (variable 2), 
and the three crossed factors are quality (variable 10), style (variable 11), and number of 
bedrooms (variable 4). Recode quality into two categories: 1-2, and 3. Recode the number of 
bedrooms into three categories: 0—2, 3, and 4 or more, Recode style as either 1 or not 1. The 
analysis should consider transformations of the response variable, Document the steps taken 
in your analysis and justify your conclusions, 


. Refer 10 the Ischemic heart disease data set in Appendix C.9. Assume that the sample sizes do 


not reflect the importance of the treatment means, Carry out an unbalanced three-way analysis 
of variance of this data set, where the response of interest is total cost (variable 2), and the three 
crossed factors are gender (variable 4), number of interventions (variable 5). and number of 
comorbidities (variable 9). Recode the number of interventions into three categories: 0-1, 2-4, 
and greater than or equal to 5. Recode the number of comorbidities into two categories: 0-1, 
and greater than or equal to 2. The analysis should consider transformations of the response 
variable, Document the steps taken in your analysis and justify your conclusions. 


Chapter 24  Multi-Factor Studies 1029 


24.29. Refer to the Ischemic heart disease data set in Appendix C.9 and Case Study 24.28. Assume 
that the sample sizes reflect the importance of the treatment means. Carry out an unbalanced 
three-way analysis of variance of this data set, where the response of interest is total cost 
(variable 2), and the three crossed factors are gender (variable 4), number of interventions 
(variable 5) and number of comorbidities (variable 9). Recode the number of interventions 
into three categories: 0—1, 2—4, and greater than or equal to 5. Recode the number of comor- 
bidities into two categories: 0—1, and greater than or equal to 2. The analysis should consider 
transformations of the response variable. Document the steps taken in your analysis and justify 
your conclusions. 


Chapter ^ !' b=. 


1030 


чту Ue 


D 
L 


Random and Mixed 
Effects Models 


Until now, we have been concerned exclusively with ANOVA model I in which the factor 
levels are considered fixed. This model is applicable for studies where our interest cen- 
ters on the effects of the specific factor levels chosen. There are still other studies where 
the factor levels are a sample from a larger population of potential factor levels and in- 
ferences are desired about the populations of factor levels. For example, in Section 163 
we described a single-factor study by a company that owns several hundred retail stores, 
Seven of these stores were selected at random, and a sample of employees in each store 
was asked to evaluate the management of the store. The seven stores chosen for the study 
constitute the seven levels of the random factor, retail stores. In this case, management 
was not just interested in the management of the seven stores chosen; it wanted to gener- 
alize the results to the entire population of stores. Because the retail stores were selected 
at random, the factor retail stores in this example is considered a random factor. Random 
factors may also be present in two-factor and multi-factor studies; either all of the fac- 
tors may be random or some may be random and some fixed. For instance, suppose in 
the previous example that eight employees were selected at random from each of the five 
departments in each of the stores. Interest now is in the employee evaluations of manage- 
ment by department and store. Here, stores would be a random factor because the seven 
selected stores are a sample of all stores. On the other hand, departments would be a fixed 
factor because there are only five departments in each store and interest is in these five 
departments. * 

Analysisof variance models for studies in which all factors are random are called ANOVA 
models П and those for studies in which some factors are random and some fixed are called 
ANOVA models III. In Sections 25.1 to 25.4 and 25.6, we consider ANOVA model II 
for single-factor studies and ANOVA models II and 1 for two-factor and three-factor 
studies. Completely randomized block designs with random block effects are taken ир 
in Section 25.5. Throughout Sections 25.1 to 25.6, we assume that all treatment sample 
sizes are equal. In Section 25.7 we consider studies where the treatment sample sizes 
are unequal. We begin our discussion with random ANOVA model II for single-factor 
studies. 


Chapter 25 Random and Mixed Effects Models 1031 


Single-Factor Studies—ANOVA Model II 


As we noted earlier, there are occasions when the factor levels or treatments in a single- 
factor study are not of intrinsic interest in themselves but constitute a sample from a larger 
population of factor levels. ANOVA model II is designed for this type of situation. Consider, 
for instance, Apex Enterprises, a company that builds roadside restaurants carrying one of 
several promoted ‘trade names, leases franchises to individuals to operate the restaurants, 
and provides management services. This company employs a large number of personnel 
officers who interview applicants for jobs in the restaurants. At the end of an interview, the 
personnel officer assigns a rating between 0 and 100 to indicate the applicant's potential 
value on the job. Five personnel officers were selected at random, and each was assigned four 
candidates at random. In this case, the company did not wish to make inferences concerning 
the five personnel officers who happened to be selected but rather about the population of 
all personnel officers. Questions of interest included: How great is the variation in ratings 
among all personnel officers? What is the mean rating by all personnel officers? 

The distinction between this situation, for which ANOVA model II is designed, and one 
where fixed ANOVA model I is appropriate can be seen readily by modifying our example 
slightly. If а smaller company had only five personnel officers who were all included in 
the study and interest is limited to these five officers, ANOVA model I would be relevant 
since the factor levels (the five personnel officers) would then not be considered a sample 
from a larger population. À repetition of the experiment for the smaller company would 
involve the same five personnel officers, but in the case of Apex Enterprises a repetition 
would involve a new.random sample of five personnel officers which would probably consist 
of different officers. 


Random Cell Means Model 


The cell means version of ANOVA model II for single-factor studies is as follows when all 
factor level sample sizes are equal, i.e., when n; z n: 


Yi; = ш + £ij (25.1) 
where: 


ш; are independent N (4.., 07) 
£j are independent N (0, с?) 
шщ and є;; are independent random variables 


i-l..rnjl...n 


ANOVA model (25.1) is similar in appearance to fixed ANOVA model (16.2). The main 
distinction is that the factor level means и; are constants for ANOVA model I but are random 
variables for ANOVA model II. Hence, ANOVA model IIis often called a random ANOVA 
model. 


Meaning of Model Terms. We shall explain the meaning of the model terms with refer- 
ence to the personnel officers in the Apex Enterprises example. The term u; corresponds to 
the mean of all ratings by the ith personnel officer if the officer interviewed all prospective 


1032 Part Five 


FIGURE 25.1 
Representation 
of ANOVA 
Model II. 


Multi-Factor Studies 


employees. The expected value of џи; is u.. Thus, u. represents here the mean rating for all 
prospective employees by all personnel officers. The variability of the personne] Officers" 
mean ratings и; is measured by the variance 07. The more the different personnel Officers 
vary in their mean ratings (for instance, some may rate consistently higher than Others), the 
greater will be оң. If all personnel officers rate at the same mean level, all и; will be equal 
to u. and then о? = 0. 

The term £j; represents the variation associated with the different potential values as 
assessed by the ith personnel officer for the different prospective employees. Note that 
ANOVA model (25.1) assumes that all ¢;; have the same variance с?. This means that the 
distributions of ratings for prospective employees by the different personnel officers аге 
assumed to have the same variability. The distributions for the different personnel officers 
may differ with respect to their means but not with respect to their variability according to 
ANOVA model (25.1). 

Figure 25.1 illustrates ANOVA model II. On the top is shown the distribution of the 
personnel officers’ mean ratings 44, which is normal. Several u; (two personnel officers’ 
mean ratings in the illustration) are selected at random from this distribution. Each in turn 
leads to a distribution ofthe potential values of prospective employees as evaluated by the ith 
personnel officer, Y;; = ш + &;, which are all normal distributions with the same variance, 
Several Y;; responses are then selected from each of these distributions (two responses for 
each personnel officer in the illustration). 


Mean: и. 


Variance: cz 


N 


Mean: 44 
Variance: с 


Ї-2-------- 
N 


Mean: u2 


Variance: a? 


Chapter 25 Random and Mixed Effects Models 1033 
Important Features of Model 


1. The expected value of a response Ү,; is: 
| E(Y,) = и. (25.2а) 
because we have by (25.1): 
E{Y;;} = E{ui} + Eley} 
=p.+0 
= u. 


Note that this expectation averages over the selections of both u; and є;;. 
2. The variance of Ү,;, to be denoted by 07, is: 


оү) = оў = 02 +0” (25.2b) 


Thus, all observations Y;; have the same variance. The result in (25.2b) follows because 
ANOVA model П assumes that и; and є; are independent random variables, and o?{u;} = 
c; and 0201) = o? according to ANOVA model (25.1). Because the variance of Y in 
this model is the sum of two components, c7 and c^, this model is sometimes called a 
components of variance model and оў is referred to as the total variance. (Reference 25.1 
provides detailed discussions of variance components models.) 

3. The Y;; are normally distributed because they are linear combinations of the indepen- 
dent normal variables u; and £j;. 

4. Unlike for fixed ANOVA model I where all observations Y;; are independent, the Y;; 
for random ANOVA model II are only independent if they pertain to different factor levels. 
The covariance of any two observations with random ANOVA model (25.1) can be shown 
to be: 


oY, Yy} =o JEJ (25.2с) 
о{Ү;;, Yep} = 0 іі (25.24) 


Thus, random ANOVA model (25.1) assumes that the covariance between any two responses 
for the same factor level is constant for all factor levels. 


We illustrate the nature ofthe variance-covariance matrix of the responses Y;; for random 
ANOVA model (25.1) for a simple illustration where there are r = 2 factor levels and n = 2 
cases for each level. The observations vector is: 


1034 Part Five  Multi-Factor Studies 


and the variance-covariance matrix of the Y observations is: 
2 2 
оү ор 0 0 
2 2 
Y о Oy 0 0 
o{Y}= jo 23 
0 0 oy oc 


2 2 
0 0 о Oy 


Note that all observations have the same variance оу, as indicated by (25.2b), апу two 
Observations from the same factor level have covariance о? as indicated by (25. 2c), andan 
two observations from different factor levels are uncorrelated as indicated by (25,24). 

The reason why any two responses from the same factor level are correlated js that, in 
advance of the random trials, the responses are expected to be similar because they will both 
have the same random component u; and will differ only because of the error terms 8. 

Once the factor levels have been selected, however, random ANOVA model (25, 1) as- 
sumes that any two responses from the same factor level are independent because the factor 
level mean w; is then fixed and the two observations differ only because of the error terms 
£j; which are assumed to be independent. Thus, in the Apex Enterprises example, once 
the personnel officers have been selected, random ANOVA model (25.1) assumes that the 
different ratings Y;; by a given personnel officer are independent. 


Comment 

At times, the population of the jz; may be relatively small and should be treated as a finite population, 
This can be done, but we do not discuss this case here. If the population of the jz; is finite but large, little 
is lost in treating it as an infinite population. We did this, in fact, in our Apex Enterprises illustration. 
The number of personnel officers employed by Apex Enterprises is finite, but since there are many 
we treated the population of the yz; as an infinite one. Thus, there are two basic situations when the 
population of the (4; is treated as infinite—when the population is finite but large, and when interest 
centers in the underlying process generating the ру. a 


Questions of Interest 


When ANOVA model II is appropriate, there is usually no interest in inferences about the 
particular ш; included in the study, such as which is the largest or smallest, but rather in 
inferences about the entire population of the 44. Specifically, interest often centers on p., 
the mean of the и; and on 02 , the variability of the и;. In the Apex Enterprises example, 
for instance, management would not ordinarily be as interested in the mean ratings of the 
five personnel officers who happened to be included in the study as in the mean rating by 
all personnel officers and in the variability of mean ratings among all personnel officers. 
While 0; is а direct measure of the variability of the џ;, the effect of this variability is 
often measured more meaningfully relative to the total variability oy in (25.2b): 
o? o? 
A= к (25.3) 


2 2 2 
Oy oO, +0° 


Note that this ratio measures the proportion of the total variability of the Y;; that is accounted 
for by the variability of the и. It takes on the value 0 when о = 0 and values near 1 when 
c; is large relative to o°. 


Chapter 25 Random and Mixed Effects Models 1035 


With reference to the Apex Enterprises example, the ratio measures the proportion of 
the total variability of ratings for all candidates by all personnel officers that is accounted 
for by differences in the mean ratings among the personnel officers. If the ratio is near 
Zero, differences in the mean ratings among personnel officers are relatively insignificant. 
On the other hand, if the ratio is large, say, .8 or more, then much of the total variability 
is accounted for by differences between personnel officers, and management may wish to 
study the advisability of giving the personnel officers more training to obtain improved 
consistency of ratings between officers. 

It can be shown that the coefficient of correlation between any two responses from the 
same factor level with random ANOVA model (25.1) is: 


о? о? 


ety, ij» Y=% = аа 20 j +Ï (25.4) 
m 


Thus, the measure in (25.3), which indicates the proportion of the total variability of the 
Ү that is accounted for by the variability of the 14, is actually the coefficient of correla- 
tion between any two observations from the same factor level. It is called the intraclass 
correlation coefficient. 


Comment 

The result in (25.4) follows from the definition of the coefficient of correlation in (A.25a): 
c (Yi, Yi] 

c(Yi)o(Y;y) 


The covariance in the numerator is given in (25.2c), and c (Y;;) = o (Y;») = oy according to (25.2b). 


ш 
Test whether c2 = 0 
We first consider how to test whether all џи; are equal: 
Ho: о? = 0 
dios (25.5) 
Ha: оў > 0 


Ho implies that all џи; are equal; that is, u; = u.. Н, implies that ће u; differ. For the 
personnel officers example, Ho implies that the mean ratings for all personnel officers are 
the same, while Н, implies that they differ. 

Despite the fact that ANOVA model II differs from ANOVA model I, the analysis of 
variance for a single-factor study is conducted in identical fashion. (This is not always 
the case in more complex situations.) The difference between the two models appears in 
the expected mean squares. It can be shown, in a manner similar to that employed in our 
derivation for ANOVA model I, that the expected mean squares for ANOVA model II when 
all treatment sample sizes equal п are as follows: 


E{MSE} = о? (25.6) 
E(MSTR) = o? + no, (25.7) 


It follows from (25.6) and (25.7) that if о = 0, MSE and MSTR have the same expectation 
c?. Otherwise, E{MSTR} > E{MSE} since n > 0 always. Hence, large values of the test 


1036 Part Five 


Example 


TABLE 25.1 
Ratings by Five 
Personnel 
Officers—Apex 
Enterprises 
Example. 


FIGURE 25.2 
Dot Plots of 
Ratings by Five 
Personnel 
Officers— 
Apex 
Enterprises 
Example. 


Multi-Factor Studies 


statistic: 


(25.8) 


will lead to conclusion H, in (25.5). Since F* again follows the F distribution When H, 
holds, the decision rule for controlling the risk of making a Type І error at о is the Same as 
the one for ANOVA model I: 


If F* < ЕП —o;r — l,r(n — 1)], conclude Ho 


If F* > FU — a;r — 1, r(n — 1)], conclude Н, (25.9) 


Note that the degrees of freedom associated with MSE here are пт — r — r(n — 1) since 
пт = rn when all factor level sample sizes are equal. 


Table 25.1 contains the results of the study by Apex Enterprises on the evaluation Tatings 
of potential employees by its personnel officers. Five personnel officers were selected at 
random, and four prospective employee candidates were assigned at random to each Selected 
officer. Figure 25.2 contains dot plots of the ratings for each of the five personnel officers, 
It appears that the locations of the rating distributions for the personnel officers differ, that 
the variability within each of the five distributions is approximately the same, and that the 


Candidate (/) 

i т 2 3 4, 
A 76 65 85 74 
B 59 75 па 67 
c 49 63 61 46 
D 74 7n 85 89 
E 66 84 80 79 


Mean 
5 ө е ео o 
o 4 ө ө ° ° 
S 
ч 
© М 
2 зе e ee 
c 
g 
& 
2 е е . . 
1 е ee ° 
E ЧБ йы E b. x c] E ob o 
50 60 70 80 90 


TABLE 25.2 
ANOVA Table 
for Single- 
Model I— 
Apex 
Enterprises 
Example. 


Chapter 25 Random and Mixed Effects Models 1037 


Source of O EM 
Variation $$ df MS General Example 
Between. 


personnel officers 5578 = 1,579.7. 4 М5ТЕ = 394.9 o?4no2 ас? + 402 
Error(within у | 

personnel officers) — $$Ё=1,0993 15 MSE= 73.3 о? c? 
Total ` |.  $510—2,6789 19 


E. 


variability within each of the rating distributions may be almost as large as the variability 
between the personnel officers. 

The ANOVA calculations are routine and are shown in Table 25.2, which also shows the 
expected mean squares in general and for the Apex Enterprises example. Using the results 
from Table 25.2, the appropriate test statistic for determining whether о = 0 18: 

394.9 


* = ——— = 5.39 
s 73.3 


To control the risk of making a Type I error at œ = .05, we require F(.95; 4, 15) = 3.06. 
Hence, the decision rule is: 

If F* « 3.06, conclude Ho 

If F* > 3.06, conclude H, 


Since F* = 5.39 > 3.06, we conclude H,, that о? > 0 or that the mean ratings of the 
personnel officers differ. The P-value of the test is .01. 


Comments 


1. We illustrate the derivation of an expected mean square for ANOVA model II by sketching 
the development for deriving E (MSTR) in (25.7) when n; = n. The proof parallels that for ANOVA 
model I. According to ANOVA model (25.1), we can write: 


Ү. = ш + &}. 


АЕ 
Ш 

TI 
+ 
o 


where £;. and £.. are defined in (16.44) and (16.47), respectively, and: 


Е. = ns Hi 
r 


(Note the use of a different notation for the mean of the jz; here than for ANOVA model I to emphasize 
the random nature of the mean of the r values и; for ANOVA model П.) Corresponding to (16.49), 
we obtain: 


мі 


= (ш — B) (E. —Е..) 


so that: 


OG. - Xy = (s - BY + Y 76. - E 429 (ш п). E) 


1038 Part Буе  Multi-Factor Studies 


When we take the expectation, the cross-product term drops out because of the independence Of the 
ш and the £;; and because the deviations jz; — ji. and &,. —&.. all have expectations zero. From (1655) 


we know that: 
= = (г — lo? 
Гуа р tct 
DS - 


Lastly, since У (p; — j2.)? is the numerator of an ordinary sample variance for r independent li 
. + * 
values, it follows from the unbiasedness of the sample variance that; 


ES = Б} = e 002 


Hence, we obtain: 


n — — n a r-—l 
ef | Уо. Kt} = [ Do; + -= a = no? +o? 


which is the теѕи in (25.7). 


2. The F* test statistic in (25.8) and the decision rule in (25.9) are also appropriate when the factor 
level sample sizes are not equal. The degrees of freedom associated with MSE are then denoted, as 
usual, by ny — r, where пу = Ут. The expected value of MSTR becomes: 


E{MSTR} = с? + n'o? (25.10) 


where: 


jtd 5 Уп 
е LOL K n) i EL SUD 
Estimation of и. 


When ANOVA model II is applicable, there is frequent interest in estimating the overall 
mean u.. We now develop an interval estimate for u. when all factor level sample sizes are 
equal. We know from (25.2a) that: 


E(Y;j) = и. 
Hence, an unbiased estimator of ju. is: 
fi. = Y. (25.11) 
[ can be shown that the variance of this estimator is: 
| o? o? по? +0? 
сҮ. } = -Ë 4+ = —Ё (25.12) 
r rn rn 


Formula (25.12) shows that the variance of ¥. is made up of two components. The first 
corresponds to the variance of a sample mean based on r values when sampling from the 
population of the џ;, and it reflects the contribution due to sampling the factor levels. The 
second component corresponds to the variance of a sample mean based on кп observations 
when sampling from the populations of the Y;;, given the иу, and it reflects the contribution 
due to variation within factor levels. 

An unbiased estimator of o?(Y..) is: 


(25.13) 


мукшы e 
Example 


Chapter 25 Random and Mixed Effects Models 1039 


This estimator is unbiased because we know from (25.7) that E{MSTR} = по? + o°. 
Dividing the result in (25.7) by rn yields (25.12). 
It can be shown that: 


- E 5 is distributed as t(r — 1) for ANOVA model (25.1) (25.14) 
si. 


Hence, we obtain in usual fashion the confidence limits for m.: 
Y. E t(1 —a@/2;r—1)s{¥.} (25.15) 


Management of Apex Enterprises wishes to estimate the mean rating for all prospective 
employees by all personnel officers with a 90 percent confidence interval. We have from 
Tables 25.1 and 25.2: 


Ү.= 7145 МТВ —3949 rn=20 


We require ¢(.95; 4) = 2.132 and: 


2. 3949 
?(y.) = ——— = 19.75 
s^(Y.] 20 


Hence, s(Y.) = 4.44, the confidence limits are 71.45 + 2.132(4.44), and the desired 
90 percent confidence interval is: 


62 <u. < 81 


Thus, with a 90 percent confidence coefficient, we conclude that the mean rating assigned 
by all personnel officers to all prospective employees is between 62 and 81. The interval 
estimate is not very precise because of the relatively small samples of personnel officers 
and potential employees. 


Comment 
The variance of Y. in (25.12) can be derived readily. First, we consider: 
Y. = pi + &- 
where &;. is defined in (16.44). Because of the independence of u; and the &;;, we have: 
| EN 2 
c^(Y.) = o2 + 2 
n 


Remember that &;. is just an ordinary mean of n independent ¢;; values. 
For the case n; = n that we are considering here, we have: 


ws ORE 
Y= ee 
r 
In view of the independence of the u; and the &;; among themselves and between each other, it follows 
that the Y;. are independent so that: 


s) eos парно? 


r r rn rn 


c^(Y.) = 


1040 Part Five 


Multi-Factor Studies 


Estimation of c2/(o2 + с?) 


Example 


As noted earlier, the ratio о? / (02 + o?) reveals meaningfully the effect of the extent of 
variation between the и;. We shall develop an interval estimate for this ratio by first Obtainin 
confidence limits for the ratio o7/o?. It сап be shown that МТК and MSE аге independen 
random variables for ANOVA model II, just as for ANOVA model I. When n; = n, the Case 
considered here, it can be shown further that: 


MSTR MSE 
: ZO F|r — 1, r(n — 1)] (25.16) 


2 27 
no, +o с 
Hence, we can write the probability statement: 


MSTR  MSE 
no? + о? `6? 


ЕП – 0/2; ғ —lr(n—0)])-1—eoc (25.17) 


Р{Е[0/2; ғ — l,r(n— 1)] < 


ІЛ 


Rearranging the inequalities, we obtain the following confidence limits L апа U for оў /o*: 


at [MEER ( 1 )-1 251 

п] MSE ЕП —o/2; r — 1, r(n — 1)] 5180) 
1 [MSTR 1 

is n | MSE zs -= 1l,r(n— ij) g ] еч 


where L is the lower confidence limit апа U ће upper. 
The confidence limits L* and U* for o2/(o7 + o?) can now be obtained and are as 
follows: 
L U 


L* = —— U* = 25.19 
14L 1+0 ( ) 


Management of Apex Enterprises wishes to obtain a 90 percent confidence interval for 
a7/(o; + 07). From previous work, we have: 


MSTR = 394.9 MSE — 73.3 n=4 r=5 
For a 90 percent confidence interval, we require: 
F(.05; 4, 15) = .170 F(.95; 4, 15) = 3.06 


Hence, the 90 percent confidence limits for оў /o? аге by (25.18): 


49 f. 1 ; 
g = 1 [3949 ae M UNE ee 
4 | 73.3 \3.06 4 | 73.3 \.170 


and the confidence interval for с2/0? is: 


2 


e, 
.19 < Æ «x7 
о? 


Chapter 25 Random and Mixed Effects Models 1041 


3 Finally, the confidence limits for 07/(o7 + o?) are obtained by (25.19); they are L* = 
2 .19/1.19 = .16 and U* = 7.7/8.7 = .89. Hence, the 90 percent confidence interval is: 


of 
16 < —; < .89 
02 +0? 

With confidence coefficient .90, we conclude that the variability of the mean ratings for 
the different personnel officers accounts for somewhere between 16 and 89 percent of the 
total variability of the ratings. Note that this interval estimate is not precise, partly the result 
of relatively small sample sizes and partly because variance components are much more 
difficult to estimate precisely than means. The confidence interval does indicate, though, 
that the variability among personnel officers is not negligible since it accounts for at least 
16 percent of the total variability. 


Comments 


1. It may happen occasionally that the lower limit of the confidence interval for ой /о? is negative. 
Since this ratio cannot be negative, the usual practice is to consider the lower limit L in (25.182) to 
be zero in that case. 

2. If one-sided or two-sided tests concerning the relative magnitudes of ой, and c? are desired, 
such as the following (where c is a specified constant): 


Hoo; < c0? — Ho: 0, = со? 
‚2 2 . 52 2 
Ha: сц > co Ha: о, Ф сс 


a decision гше can be constructed by utilizing (25.16). Alternatively, one-sided or two-sided confidence 
intervals can be used to draw the appropriate conclusion. 

3. The ratio o7/c? is of relevance in planning investigations. In the Apex Enterprises example 
dealing with the personnel officers, suppose that the mean rating и. is to be estimated, and that the 
costs of including in the study a personnel officer and a candidate are c, and c2, respectively. For a 
given total budget C, the ratio of /о? is the determining variable for finding the optimum balance 
between the number of personnel officers and the number of candidates to include in the study so as 
to minimize the variance of the estimator. If the populations are not large, the model will need to take 
account of their finite nature. ш 


Estimation of c? 
At times, it is desired to estimate c? and ср separately. According to (25.6), an unbiased 
estimator of c? is MSE. An interval estimate for c? is easily constructed. We make use of 
the fact that [r (n — D)MSE]/o? is distributed as a x? random variable with r (n — 1) degrees 
of freedom: 


— DMSE 
T petri - 0] (25.20) 
It follows that a 1 — о confidence interval for c? is: 

r(n — 1)MSE >. r(n— 1)MSE 
«g^ X ————————— 
х1 0/2; (п - D] 1n ^ x?e/z;r(n — 1)] 


(25.21) 


1042 Part Five Multi-Factor Studies 


Example 'To construct a 90 percent confidence interval for o? for the Apex Enterprises example, we 


require: 
МЕ = 733 x?(05:15 2726 . x'(95:15) = 25.0 
The desired confidence interval by (25.21) then is: 


15(73.3 15(73.3 
Д do Ua БЕ 
25.0 7.26 
An approximate 90 percent confidence interval for o is obtained by taking the square TOOts 


of the confidence limits for o?: 


44.0= 


6.6 < ос < 12.3 


With 90 percent confidence, we conclude that the standard deviation of the ratings of 
prospective employees for each personnel officer is between 6.6 and 12.3 points. 


Comment 


Confidence interval (25.21) is also appropriate when the factor level sample sizes are not equal. The 
degrees of freedom associated with MSE are then denoted by ny — r. Г 
Point Estimation of c7 


An unbiased estimator of о? is available by noting that we have from (25.6) and (25.7): 


E{MSE} = о? 
E{MSTR} = o? + na; 
It follows that: 
2 _ E{MSTR} — E{MSE} 


о? (25.22) 
п 


> 


An unbiased estimator of 07 is obtained by substituting the observed mean squares for the 
corresponding expected mean squares: 
> МТК — MSE 


um 


(25.23) 


n 


Occasionally, this point estimator will turn out to be negative. Since a variance cannot be 
negative, the usual practice is to consider the estimator to be zero in that event. 
Comment A 
An unbiased estimator of о when the factor level sample sizes are not equal can be obtained by 


slightly modifying the expression in (25.23). The denominator п is simply replaced by n’ as defined 
in (25.102). ш 


Interval Estimation of оў, 


It is not possible to construct exact confidence intervals for e However, several approx- 
imate confidence intervals have been developed. We shall now describe two approximate 
confidence intervals for c7, assuming as before that the study is balanced; that is, n; = л. 


Chapter 25 Random and Mixed Effects Models 1043 


Procedures for constructing confidence intervals for оў when the factor level sample sizes 
are not equal are presented in Section 25.6 and in Reference 25.2. 


Satterthwaite Procedure. The Satterthwaite procedure (Ref. 25.3) is a general procedure 
for constructing approximate confidence intervals for linear combinations of expected mean 
squares. Note that оў is such a linear combination since we can express (25.22) as follows: 


о? = h E{MSTR} + m E(MSE) (25.24) 
и п n 


In general, we shall state a linear combination of expected mean squares as follows: 
L=c,E{MS,}+---+cnE {MS} (25.25) 


where the c; are coefficients. 
An unbiased estimator of L is: 


Ї =c,MS, +--+ +cnMS;, (25.26) 


Let df; denote the degrees of freedom associated with mean square MS;. Satterthwaite has 
suggested that the distribution of the statistic: 


(af). 
L 


can be approximated by a x? distribution whose degrees of freedom, denoted by df, are 
given by: 


(25.27) 


(c1 М5, +--+ c, MS,)? 


df= e lbs сс MR 25.28 
2 (MS, m (сьМ5һ)? : 
dfi df, 
An approximate 1 — œ confidence interval for L therefore is: 
df)L L 
(df) EP) uo 


x*ü — a2; df) x*(a/2; df) 
where df is given by (25.28). 
For the single-factor random ANOVA model (25.1) for a balanced study (n; = n), we 


have the following correspondences: 


MS, = MSTR MS, = MSE 
df —-r-1 df, = пт —r —r(n — 1) 
1 
с = – о ——— 
п n (25.30) 


m 

Il 

а 
N 


2- G) E(MSTR) + (-;) E{MSE} 


К 1 
=s, = eL (-z)wse 
n n 


1044 Part Five 


Example 


Multi-Factor Studies 


Hence, an approximate 1 — o confidence interval for в] by the Satterthwaite procedure 
(25.29) is: 


саг. а 
x2 = 0/2; аР) eft x2(a/2; df) (25.31) 
where: 
1 (ns2)* 
oe, (МӘТЕ)? (МЕ)? (25.312) 
r-l r(n— 1) 


Usually, the degrees of freedom will not turn out to be an integer. Interpolation in the x 
table or rounding to the nearest integer may then be used. 

While the Satterthwaite procedure is general and easy to carry out, the accuracy of the 
approximation can be quite limited when some of the coefficients c; are negative and some 
are positive. Note that this is the case here in (25.30), since су = 1/7 and c? = —1/n. More 
detailed guidelines as to when the Satterthwaite approximation is appropriate are given in 
Reference 25.4. 


For the Apex Enterprises example, we shall first obtain a point estimate of 0; by means of 
(25.23). We require: 


MSE — 73.3 MSTR — 394.9 n=4 
Hence we find: 
з 394.9 — 73.3 
= S EE — 80.4 


and the estimated standard deviation of the mean ratings of all personnel officers is /80.4 = 
9.0 points. 

Next, we obtain a 90 percent confidence interval for о? by the Satterthwaite procedure. 
Using the earlier results: 


5; —804 MSTR=3949 МЅЕ=733 n=4 т=5 
we obtain the degrees of freedom df by means of (25.3 la): 
4(80.4)]? 
df — l : 1 ; = 2.63 
(394.97 (73.3) 
5-1 5(4 — 1) Д 


which we shall round up to 3.0. Confidence limits (25.31) also require: 
x°(.05: 3) 2.3532  x?(.95;3) = 7.81 
so that the Satterthwaite approximate 90 percent confidence interval for оў is: 


3(80.4) . > _ 3(804) 


30.9 = 
7.81 É .352 


— 685.2 


TABLE 25.3 
:Computational 
Formulas for 
MLS 
Approximate 
l-a 
Confidence 
Limits in 
(25.34). 


Chapter 25 Random and Mixed Effects Models 1045 


By taking square roots of the two limits, we obtain an approximate confidence interval 
for op: 


5.6 < o, < 262 


Hence, with approximate 90 percent confidence coefficient, we conclude that the standard 
deviation of the mean ratings of all of the personnel officers is between 5.6 and 26.2 points. 


MLS Procedure. Animproved procedure for obtaining an approximate confidence inter- 
val for о is based on the modified large sample (MLS) procedure (Ref. 25.5). It involves 
somewhat greater computational complexity than the Satterthwaite procedure, and is de- 
signed to estimate a linear combination of two expected mean squares for balanced studies 
of the form: 


L = c E{MS,} + со E(MS;) cı > 0, €; < 0 (25.32) 
where c, is positive and c; is negative. An unbiased estimator of L is: 
Ї =cMS +M c>0, <0 (25.33) 


If (df,)MS,/ E{MS;} and (ар) MS) /E{MS} are independent x? random variables with df, 
and df, degrees of freedom, respectively, an approximate 1 — œ confidence interval for L 
is given by: 


L-H,«L«L-4Hy (25.34) 
where Ё, is defined in (25.33) and Н; and Hy are defined by the equations in Table 25.3. 


F2 = Е(1— 0/2; 0, oo) (25.34b) 
Ез = F(1,— e/2;co, dh) (25.34c) 
Ра = F(1—a/2;oo, dí) (25.34d) 
Fs = F(1 — o2; dfi, dh) (25.346) 
Fe = F(1 —a/2, db, dfi) (25.34f) 
Gd " (25.349) 
G;-1— i: (25.34h) 
TE (Ux 12 3d Fs)? – (Fa co ТРЕТ" 
"5 
n) ENA - 
Hi = (GG MS + [(Е4 — 1)с›М5›]? — Gsc coMS М52)1/2 (25.34k) 


Hy = {ICFs — Na MS; P+ (GzG;MSzY? — Gacy GMS, MS) 2 (25.341) 


1046 Part Five 


Example 


Multi-Factor Studies 


To obtain an approximate 1 — o confidence interval for оў with the MLS procedure We 
simply observe that the correspondences in (25.30) for the Satterthwaite procedure арр} 
here also and confidence interval (25.34) becomes: y 


2 
52 а H, < о < Sy + Hy (25.35) 


For the Apex Enterprises example, we shall obtain a 90 percent confidence interval fo, о? 
by means of ће MLS procedure. From earlier, we have: к 


c; = 1/п = 1/4 = 25 MS, = MSTR = 3949 df =r—1=4 
с = —1/n = —1/4=—.25 MS, = MSE = 73.3 df, = r(n — 1) =15 
Ê = s? = 80.4 


We first determine the six percentiles (25.34a) to (25.34f): 
F, = F(.95;4, оо) = 2.37 F> = F(.95; 15, оо) = 1.67 
Ез = F(.95; со, 4) = 5.63 Е = F(.95; со, 15) = 2.07 
F; = F(.95;4, 15) = 3.06 Fg = F(.95; 15,4) = 5.86 


Intermediate calculations required are: 


1 

Gı = 1 — = .5781 

l 2377 
Bassi. = 4012 

T as 

(3.06 — 1)? — [(5781)3.06P — (2.07 — 1)? 

GS = —.01 

3.06 | = 
G, = 5.86 CI | (595 -1y (4012) | = —.5708 

APT 5.86 5.86 4 cs 


H, and Hy are then computed as follows: 
H, = {[(.5781)(.25)394.9 + [(2.07 — 1)(—.25)73.3P 
— (—.0100)(.25) (—.25)(394.9)73.3]!? 
— 60.2 
Hy = ([(5.63 — 1)(.25)394.9] + [(.4012)(—.25)73.3P Р 
— (—.5708)(.25)(—.25)(394.9)73.3}'/? 
= 456.0 
The approximate 90 percent confidence interval for 0 therefore is: 


20.2 — 80.4— 60.2 « on < 80.4 + 456.0 = 536.4 


х Chapter 25 Random and Mixed Effects Models 1047 


Taking the square roots of the confidence limits, we obtain an approximate confidence 
interval for су: 


45 < о„ < 23.2 


Notice that in this instance the confidence limits obtained by the Satterthwaite procedure (5.6 
and 26.2) are quite similar to the ones just obtained by the more accurate MLS procedure. 
Note also the impreciseness of the MLS confidence interval here, a result of the small 
sample sizes and the difficulty in estimating variance components precisely. 


Random Factor Effects Model 
We can express the single-factor random cell means model (25.1) in an equivalent random 
factor effects fashion, just as we did for fixed factor levels in Chapter 16. We do this by 
expressing each factor level mean u; as a deviation from its expected value, E {u;} = u., 


as follows: 

а= щи. (25.36) 
Then we simply replace ш; in ANOVA model (25.1) by its equivalent expression from 
(25.36): 

diei (25.37) 
The random factor effects model therefore is expressed as follows: 

Yi; =u. + + 8) (25.38) 

where: 


р. is a constant component common to all observations 

1; are independent N (0, о?) 

ву are independent N (0, с?) 

t; апа £;; are independent 

PH kenrg = ein 

Note that the т; are random variables in ANOVA model (25.38). With reference to 
the personnel officers in the Apex Enterprises example, t; represents the effect of the ith 
personnel officer who is selected at random. Specifically, т; measures by how much the 
mean rating of all potential employees by the ith personnel officer differs from the overall 
mean rating by all personnel officers. 


25.2 ‘Two-Factor Studies—ANOVA Models П апа Ш 
ANOVA Model II—Random Factor Effects 


Consider an investigation of the effects of machine operators (factor A) and machines 
(factor B) on the number of pieces produced in a day. Five operators and three machines are 
used in the study. Yet the inferences are not to be confined to the particular five operators 
and three machines participating in the study, but rather they are to pertain to all operators 


1048 Part Five 


Muli-Factor Studies 


and all machines available to the company. Here a random factor effects ANOVA 
(model 1I) would be appropriate for the two-factor study, since each of the two sets 
levels may be considered the result of sampling a population (all operators 
about which inferences are to be drawn. 

In the random [actor effects version of ANOVA model II for a two-factor Study, yy 
assume analogously to a single-factor study that both the factor A main effects о; and ү 
factor В main effects B; are independent random variables. Further, we assume that the 
interaction effects (o P);; are independent random variables. Thus, the random f; 


` actor leve] 
effects version of ANOVA model II for a two-factor study with equal sample sizes n is: 


Model 
Of factor 
. all machines) 


Үк = p. +0; + B; + (обу + Eijk (25.39) 


where: 


H.. IS a constant 
аг, Bj. (of);; are independent normal random variables with expectations 


Е 


zero and respective variances оу, оў. Сон 
& jx are independent N (0, o?) 
Qi, Bj. (or));;, and єг are pairwise independent 
b= hee pap aN ce DEKE Aes п 


Meaning of Model Terms. We shall explain the meaning of the terms in random ANOVA 
model (25.39) with reference to the production example involving the two factors, machine 
operators and machines. The main effect of operator ; in the study (selected at random 
from the population of operators) is œ;. Similarly, the main effect of machine j in the study 
(selected at random from the population of machines) is 8}. Further, the interaction effect 
between operator ; and machine j on the number of pieces produced per day is (wf);;. 
ANOVA model (25.39) assumes that the main effects of operators on output per day are 
normally distributed with zero mean and variance o7. Similarly, the main effects of machines 
are normally distributed with zero mean and variance о. Finally, the operator-machine 
interaction effects are normally distributed with mean zero and variance Bou: Since random 
factor effects ANOVA model (25.39) assumes these three effects to be independent random 
variables, the mean output for operator /—machine j, namely, џ;; = ш.. Ро; + B; + (08), 
may be viewed as the sum of independent selections of œ;, 8, and (o 8), from three different 
normal distributions. 


Comment 


We caution that random factor effects ANOVA model (25.39) should only be used if the factor levels 
of the two factors do indeed represent random samples from populations of interest. Also. when a 
study involves only a few levels of cach random factor. precise estimation of the factor variance 
components will usually be very difficult because of the small number of factor levels sampled. W 


Important Features of Model 
1. For ANOVA model (25.39), the expected value of response Y;;, is: 
ЕТ} =. (25.40а) 


Chapter 25 Random and Mixed Effects Models 1049 


2. The variance of Y;;,, denoted by o7, is: 
сҮ) = 07 = o2 + ср + oag +o? (25.40b) 


The Y;;, thus have constant variance. They are normally distributed because they are linear 
combinations of independent normal random variables. 

3. In advance of the random trials, different responses Y;;, are independent except 
for responses from the same factor A level and/or from the same factor B level, which 
are correlated because they contain some common random terms. The covariances are as 
follows: 


o {Yir Yije} = 03 PE NU (25.41a) 
o (Yi, Yo je) = оў ii (25.41b) 
c {Yije Yije} = 02 +05 +02, КФК (25.410) 
o (Yi, Yr jp) = 0 izi,.jzj (25.41d) 


: ANOVA Model III—Mixed Factor Effects 


x 


When one ofthe two factors has fixed factor levels while the other has random factor levels, a 
mixed factor effects ANOVA model (model IIT) is applicable. An instance where this model 
may be appropriate is an investigation of the effects of four different training methods 
(factor A) and five instructors (factor B) upon learning in a company training program. The 
four levels for training methods may be considered fixed, since interest centers in these 
particular training methods. In contrast, the levels for instructors may be viewed as random, 
since inferences are to be made about a population of instructors of which the five used in 
the study are viewed as a sample. 

Two mixed factor effects ANOVA models are widely used. They are related to each 
other and are called the restricted and unrestricted mixed models. The restricted model is 
somewhat more general, and will be the mixed model that we shall present. When factor A 
has fixed factor levels and factor B has random factor levels, the a; effects are constants 
and the В; effects are random variables. The interaction effects (06):; are also random 
variables because the factor B levels are random. As for the fixed effects ANOVA model 
for two-factor studies, the fixed effects œ; in the restricted mixed model will be subject to 
the restriction that their sum is zero; i.e., » ^o; = 0. Similarly, the interaction terms (o);; 
will be subject to a restriction related to the fact that all fixed factor A levels are included 
in the study; the restriction is that 5 ", (8); = 0 for each level j of random factor B. Any 
two interaction terms will be independent, as in the random effects model (25.39), except 
if they come from the same level of random factor B in which case they will be correlated. 
The correlation is related to the restriction that У ^, (o£); = 0 for each level j of random 
factor B. 

The restricted mixed ANOVA model for two-factor studies, where factor A is fixed and 
factor B is random, can now be stated as follows: 


Yi = ш. + 0i + Bj + (0В):; + Eijk (25.42) 


1050 Part Five  Multi-Factor Studies 


where: 
H.. 15 а constant 
о; arc constants subject to the restriction > ^ o; = 0) 
В, are independent № (0. оў) 


а— 1 . Lo 
(o f);; are М (o. etas) , subject to thc restrictions: 


op = 0 for all j 
[s ar 
o ((orfyig. (eB); = = Oop 11 
а 


£j; are independent N (0, с?) 
Bi. (a);;, and єү are pairwise independent 
PS hana pS us Bik 1s n 


Comments 


|. Note that Ou is not the variance of the interaction terms in model (25.42) but is proportional 
10 their variance, the proportionality constant being (a — 1)/a. The reason why the variance of the 
interaction terms in ANOVA model (25.42) is expressed as (a — Des, /a rather than simply as Oz, is 
so that the expected mean squares will be relatively simple expressions. This facilitates the making 


of inferences for this model. Some texts denote the variance of (a@f);; by оду. 

2. The unrestricted mixed ANOVA model for two-factor studies is quite similar to the restricted 
model in (25.42). In the unrestricted model, there are no restrictions on the interaction effects (af); 
and they are pairwise independent. Denote the unrestricted random effects by £7 and (o). Also let 
(о3)*, denote the mean of the unrestricted interaction terms (0/8); (0B... (o B); for the fixed 
factor A levels for any factor level j of random factor B. Then the terms 6; and (o); in restricted 
model (25.42) are related to the unrestricted terms as follows: 


Bi = В; + 0P (ав); = (BY, — (BY; (25.43) 


The restrictions on the (е8) in model (25.42) follow from the relation in (25.43). References 25.6 
and 25.7 contain detailed discussions of the restricted and unrestricted mixed ANOVA models. Ш 


Important Features of Model. The expected value of response У; д for mixed ANOVA 
model (25.42) is: 


E(Yia] = и... + 0j (25.44) 
The variance of У; follows directly from the pairwise independence of о, £j, and 
(ү: 
э э 2 a— | 2 2 
оу) = 05 = о о>, +07 25.45) 
{ ik} Y В + » ofi ( 


Notice that the У, have constant variance. Further, they are normally distributed because 
each is a linear combination of independent normal random variables. 

In advance of the random trials, different responses У; are independent if they are not 
from the same random factor B level. Responses from the same random factor B level are 


TABLE 25.4 
Illustration of 
Váriance- 
Covariance 
Matrix for 
Mixed Model 
(25.42)—a = 2, 
b=2,n = 2. 


Chapter 25 Random and Mixed Effects Models 1051 


correlated; their covariances are as follows: 


—1 
clYgeYge]— 024 — oh ak AK (25.46a) 
a 
1 
colYig, Ye je} = ср — "^ izi (25.46b) 
o {Yijr, Yr ju} = 0 jzj (25.46c) 


Covariance Structure of Observations. We shall illustrate the form of the variance- 
covariance matrix of the responses Y;;, for mixed ANOVA model (25.42) for a simple 
example. Here, A is a fixed factor with a — 2 levels, B is a random factor with b — 2 levels, 
and n — 2 responses are obtained for each of the six treatments. The variance of response 
Yij, is according to (25.45) fora = 2: 
ои) = oy = оў t 075/2 +07 

The covariance in (25.462) will be denoted by о to indicate that the two Y;;;, observations 
only differ for the replication. Similarly, the covariance in (25.46b) will be denoted by ci; to 


indicate that the two observations come from different factor A levels but not from different 
factor B levels. In this notation, the two pairwise covariances are for a — 2: 


Okk = оў + 023/2 
ош = of — 2 


The response vector Y for this example is shown in Table 25.4a. Note that the observations 
are listed in the vector with i varying within j. This permits a simple block structure 


1052 PartFive  Multi-Factor Studies 


presentation of the variance-covariance matrix in Table 25.4b. In this presentation four 
rows and four columns are represented by a block matrix. Because of the symmetry of the 
blocks, only two ditlerent block matrices are required. These are shown in Table 25 4b 
Note the correlations between pairs of observations on the block main diagonal anq the 
uncorrelatedness elsewhere. 


Comment 


The reason why the restricted mixed model in (25.42) is somewhat more general than the unrestricted 
model is that two observations from the same random factor B level can be positively or negatively 
correlated for the restricted model according to (25.46b) but cannot be negatively correlated for the 
unrestricted model. п 


25.3 Two-Factor Studies—ANOVA Tests for Models Il and Ш 


For both the mixed and random ANOVA models for two-factor studies, the analysis of 
variance calculations for sums of squares are identical to those for the fixed ANOVA model. 
Thus, formulas (19.37) and (19.39) are entirely applicable for two-factor ANOVA models 
II and Ш. Similarly, the degrees of freedom and mean squares are exactly the same as 
those shown in Table 19.8 for the fixed two-factor ANOVA model. The random and mixed 
ANOVA models depart from the fixed ANOVA model only in the expected mean squares 
and the consequent choice of the appropriate test statistic. 


Expected Mean Squares 


The expected mean squares for the random and mixed ANOVA models for balanced two- 
factor studies can be worked out by utilizing the properties of the model and applying 
the usual expectation theorems. They are shown in Table 25.5, together with those for the 
fixed ANOVA model. The derivations are tedious, but simple rules have been developed for 
finding the expected mean squares. These rules are described in Appendix D. 


TABLE 25.5 Expected Mean Squares for Balanced Two-Factor ANOVA Models. 
Mean Fixed ANOVA Model Random ANOVA Model Mixed ANOVA Model 
Square df (A and B fixed) (A and B random) (A fixed, B random) 
MSA a-1 c? + nb% : a? + nbo? + nod, o^ + nba it по 
УВ} 
MSB b—1 o? + парт с? + naog + по2, c? + nao 
2 272 oy; 2 2 2 2 
MSAB (а — 1)Yb — 1) с + па 161) [o3 + по [03 + пок 
MSE (n— 1)ар o? o? о? 


Chapter 25 Random and Mixed Effects Models 1053 


TABLE 25.6 Test Statistics for Balanced Two-Factor ANOVA Models. 


шын оу о тт У j | a T E = 5 T = E 
test for Presence Fixed ANOVA Model Random ANOVA Model Mixed ANOVA Model 
: ' (Aand B fixed) (A and B random): (A fixed, B random) 
MSA/MSE ^ — MSA/MSAB MSA/MSAB 
,MS B/MSE zMSB/MSAB MSB/MSE 
MSAB/MSE MSAB/MSE MSAB/MSE 


Construction of Test Statistics 


Example 


As usual, each statistic for testing factor effects is constructed by comparing two mean 
squares that have the properties: 


1. Under Ho, both mean squares have the same expectation. 
2. Under H,, the numerator mean square has a larger expectation than the denominator 
mean square. 


It can be shown that such a test statistic follows the F distribution if Hp holds. The 
decision rule is constructed in the ordinary fashion, with large values of the test statistic 
leading to H4. 

For instance, to test for the presence of factor A main effects in random ANOVA model 
(25.39), namely: 


Ho: o2 =0 


25.47 
Ha: 02 > 0 s ; 


we see from Table 25.5 that MSA and MSAB both have the same expectation if o2 = 0, that 
is, if factor A has no main effects. If o? > 0, E {MSA} is greater than E (MSAB). Hence, the 
appropriate test statistic is: 


MSA 
F* = —— 25.48 
MSAB ( ) 


and the decision rule for controlling the Type I error at о is: 
If F* < F[1—05;a — 1, (a — 1)(b — 1)], conclude Ho 


(25.49) 
If F* > F[1 — о;а — 1, (a — D) (b — 1)], conclude Н, 


Note that the denominator for testing for factor А main effects in the random ANOVA model 
is MSAB, whereas it is MSE in the fixed ANOVA model. 

We summarize the appropriate test statistics for mixed and random ANOVA models in 
Table 25.6. For comparison purposes, we also present the test statistics for the fixed ANOVA 
model there. As may be seen from Table 25.6, the denominator of the test statistic for mixed 
and random ANOVA models in a number of instances differs from that for the fixed ANOVA 
model. Hence, it is important that the expected mean squares be known when random or 
mixed models are utilized so that the appropriate test statistics can be determined. 


We return to our earlier mixed ANOVA model example of four different training methods 
(factor A, fixed) and five instructors (factor B, random). Four classes were assigned to each 
training method-instructor combination. The response variable of interest was the mean 


1054 Раг Еме  Multi-Factor Studies 


TABLE 25.7 ANOVA Table for Mixed ANOVA Model—Training Example (A fixed, B random, a = 


реБ иа) 4, 
Source of = 
Variation SS df MS Fe 
Factor A (training methods, fixed) 42.1 3 14.0 14.0/3.9 = 3 sg: 
Factor B (instructors, random) 53.9 4 13.5 = 359 
AB interactions 46.7 12 3.9 
Error 126.4 60 2.1 
Total 269.1 79 


Е(.95; 3,12) = 3.49 F(.95; 4, 60) = 2.53 
F(.95; 12, 60) = 1.92 


improvement per student in the class at the end of the training program. The data are not 
shown, but the ANOVA table is presented in Table 25.7. To test whether or not training 
methods and instructors interact: 


Ha: Oz =0 
Ha: бон > 0 


we utilize according to Table 25.6 the test statistic: 


f= MSAB 
~ MSE 
Using the results from Table 25.7, we obtain: 
3. 
Fr = 2 = 1.86 
2.1 


For level of significance о = .05, we require F(.95: 12. 60) = 1.92. Since Ё“ = 1.86 < 
1.92, we conclude that training methods and instructors do not interact. The P-value of this 
test is .06. 

The test statistics for testing training method main effects and instructor main effects are 
shown in Table 25.7. By comparing the test statistics with the appropriate percentiles of the 
F distribution shown at the bottom of Table 25.7 for level of significance œ = .05 each, we 
find that both training methods and instructors differ in effectiveness. 


Comment * 


When there is only one case per treatment (y. = 1) with the fixed two-factor ANOVA model, we 
know from Section 20.1 thal no exact tests are possible unless the model can be modified. The 
reason is that MSE = 0 always in thai case so that no estimate of с> can be obtained. In contrast, 
Table 25.5 indicates that exact tests for both factor A and factor В main effects are possible with 
the random two-factor ANOVA model when n = 1 without any restrictive assumptions about the 
interactions, This is because MSAB is the appropriate denominator of the test statistic here, and MSAB 
can be determined regardless of sample size, With the mixed ANOVA model where factor A is the 
fixed factor. the presence of factor A main effects can also be tested when n = | without the need 


Chapter 25 Random and Mixed Effects Models 1055 


for restrictive assumptions about the interactions. However, an exact test for factor B main effects 
would require the assumption that all interactions are zero or some other modification of the ANOVA 
model. a 


54 Two-Factor Studies—Estimation of Factor Effects 


* 


for Models II and III 


timation of Variance Components 


^ 


& 


Example 


When a random factor has significant main effects, we often wish to estimate the magnitude 
of the variance component. Unbiased estimators can readily be derived from appropriate 
linear combinations of the expected mean squares in Table 25.5. For instance, the variance 
component оў in mixed ANOVA model (25.42) can be estimated by noting that: 


E{MSB} — E{MSE} = o? + паср — о? = nao 


Hence, we have: 


gis кшш MS (25.50) 


and an unbiased estimator of оў 18: 


ДЕ ВЕ (25.50а) 
па 
Approximate confidence intervals for the variance components in balanced two-factor 
studies can be obtained by either the Satterthwaite procedure in (25.29) or the MLS pro- 
cedure in (25.34). For example, the MLS procedure can be used to estimate the variance 
component of in mixed ANOVA model (25.42) by noting from (25.502) that 5% can be 
expressed in the form (25.33): 


The correspondences are MS; = MSB, MS, = MSE, сү = 1/na, and c; = —1/na. The 
approximate 1 — œ MLS confidence limits therefore are: 


sg — Н, < ор < 52 + Hy (25.51) 


where Н; and Hy аге determined using the formulas іп Table 25.3, with df, = b — 1 and 
df, = (n — 1)ab. 


Inthe training example of Table 25.7 with one fixed and one random factor, random factor B 
(instructors) had significant effects. To estimate og. we utilize the estimator in (25.50a). 
Substituting, we obtain: 


135-21 — 


71 
16 


2 = 
Sg = 


1056 Part Five 


Multi-Factor Studies 


To construct an approximate 95 percent confidence interval for оў by the MLS procedur: 
we first note that the correspondences to the form in (25.33) are: 5 


1 
= — = — = .0625 
SERT AD 
1 1 
Сэ = = 
па 4(4) 
df,=b-1=4 


.0625 


MS, = MSB = 13.5 


MS; = MSE = 2.1 


df, = (n — 1)аЬ = 60 


Carrying out the calculations indicated in Table 25.3, we first obtain the percentiles: 


Е = F(.975;4, оо) = 2.79 
F; = F(.975; оо, 4) = 8.26 
F; = F(.975;4, 60) = 3.01 


and then: 
G, = .6416 
Сэ = .2806 
Сз = .0266 


Р = (.975; 60, оо) = 1.39 
F, = (.975; оо, 60) = 1.48 
Fs = Е(.975; 60, 4) = 8.36 


G, = —.4834 
Н; = .55 


The desired confidence interval is obtained from (25.5 1): 
16= 71 — .55 < оў < .71 + 6.12 = 6.83 


Hence, an approximate 95 percent confidence interval for og, the standard deviation mea- 
suring the variability among instructors, is: 


.4 < Op < 2.6 


Estimation of Fixed Effects in Mixed Model 


Point Estimators. We now consider point and interval estimation of fixed effect param- 
eters for balanced mixed model (25.42), where factor A is fixed and factor B is random. 
The situation is more complicated than for fixed ANOVA model I because certain pairs 
of observations are correlated for the mixed model, as we have seen in (25.46). When the 
responses Y are correlated, the method of generalized least squares must be used to obtain 
minimum variance unbiased estimators. Weighted least squares, discussed in Chapter 11, is 
a special case of generalized least squares. It turns out, however, that the generalized least 
squares estimators of the fixed effects o; for the balanced case are the same as the ones 
obtained by the method of ordinary least squares: 


âi = Y.. — Ý.. (25.52) 

Frequently, the marginal mean w. is also of interest. Since p; = u.. + о, it follows 
from (25.52) that a best linear unbiased estimator of ш. for balanced studies is: 

fu. = Y. + (Y. — Y.) = Yi. (25.53) 


Chapter 25 Random and Mixed Effects Models 1057 


Often a contrast of the fixed effects o; 1s also of interest: 


L= 35) со; where Ў, с: = 0 (25.54) 
Ап unbiased estimator of L is: 
=) adi = у a. - Y) = M ak. (25.55) 
Variances of Estimators. For mixed ANOVA model (25.42) for balanced studies, it can 
be shown that the variance of б; is as follows: 
а? + пс2, _ E{MSAB} 
bn bn 


It can also be shown that the variance of a contrast £ of the estimated fixed factor A effects 
Ĝi, defined in (25.55), is as follows: 


oc^(£)- У 202401} (25.57) 


0 {4} = 


(25.56) 


where o? (6;) is given in (25.56). 

Since o?(á;] is a constant multiple of an expected mean square, it can be estimated 
unbiasedly and exact confidence intervals for œ; and for contrasts of the œ; can be obtained. 
An unbiased estimator of the variance of â; is: 


B 
52461} = MS (25.58) 
bn 
and of a contrast of the б; is: 
А MSAB 
2[$Y 2 
^ = —- 278 (25.59) 


Comment 


The variances in (25.56) and (25.57) are obtained by recognizing that @; and L are linearcombinations 
of the responses Ук. For instance, consider an experiment with a = b = n = 2. Then à in (25.52) 
is as follows: 


= E. О + Yu2 + Yi + Yo) — gn Tod Yo) 
fuil Pel erar E 
1и 12 g Ja 8/2 g Jn 
4 rae аара кз 
E 212 7g | ¥en g 1222 


Let the coefficients of the responses Y be denoted by c and define the row vector of the coefficients 
as follows: 


=(сү € -- Car) 


and let Y as always denote the vector of the responses. We can then represent the estimator (à; or Ê) 
as сҮ. 

We know the variances and covariances of the responses У, from (25.45) and (25.46). Let o?{Y}, 
as usual, denote the variance-covariance matrix containing these variances and covariances. We then 


1058 Part Five 


Example 


Multi-Factor Studies 


utilize (5.46) to obtain the variance of the estimator, namely c'o^[Y]c. The resulting vari 
be expressed in terms of the variance components т^, оў. апа o: We then use the expe 
squares in Table 25.5 tor the mixed model to express the variance, if possible, in terms o 


ance wil] 
cted mean 
f expected 

a 
Confidence Intervals for Fixed Effects Contrasts. lt is not always possible to obtain 
exact confidence intervals for the fixed effects in mixed models. Exact confidence intervals 
are available only when the variance of the estimated parameter or contrast of interest ig 
proportional to an expected mean square from the analysis of variance table. In cases where 
the variance is not directly proportional to an expected mean square, Satterthwaite’s method 
can sometimes be used to construct approximate confidence intervals. For mixed ANOVA 
model (25.42), it is possible to obtain exact confidence intervals for contrasts of the fixed 
effects о; because o7{&;} in (25.56) is a constant multiple of E{MSAB}. It can be shown 
that: 


mean squares. 


L-L 
SUY is distributed as t[(a — 1)(b — 1)] (25.60) 
As a result, the 1 — o confidence limits for L are: 

Ê x i1 — 0/2; (a — 0 — DIst£) (25.61) 


where £ is given by (25.55) and s?(£.) is given by (25.59). 
Notice that confidence limits (25.61) are identical to those in (19.65) for the fixed ANOVA 
model, except that: 


1. MSAB replaces MSE in the estimated variance of the contrast. 
2. The degrees of freedom now are (a — 1)(b — 1) instead of (n — 1)ар since a different 
mean square is utilized. 


In the training example of Table 25.7, no interaction effects were found to be present. 
We now wish to estimate the difference L — o, — o» in the mean improvements between 
maining methods | and 2, using a 95 percent confidence interval. The relevant sample results 
are: 
Yi.. = 43.1 Y... = 40.8 

Hence, our point estimate of L = o, — 05 = Hı. — H2. is: 

Ї = Y... — №. = 43.1 — 40.8 = 2.3 
From (25.59). the estimated variance is: 


€ MSAB 2(3.9) 
501 p ЕДЫ, 20 
or s{L} = .62. There are 12 degrees of freedom associated with MSAB; hence, we require 
t(.975: 12) = 2.179. The confidence limits (25.61) therefore are 2.3 + 2.179(.62) and the 


desired confidence interval is: 


9 € ш. — ua. < 3.7 


Thus, we conclude with confidence coefficient .95 that training method | is more effective 
than training method 2, its mean improvement being somewhere between .9 and 3.7 units 
larger. 


Chapter 25 Random and Mixed Effects Models 1059 


Multiple Comparison Procedures. Multiple comparison procedures can be utilized for 
the main effects of the fixed factor in a mixed two-factor ANOVA model in the same 
way as for the fixed ANOVA model. For example, suppose we wish to obtain all pairwise 
comparisons between the different training methods in the training example in Table 25.7 
by means of the Tukey procedure. We would calculate s?(£.) as in the previous example. 
The £ multiple in (25.61) now would be: 


1 
Т = ——=4[1—о;а, (a — 1)(b — 1)] 25.62 
Jai ( ) 
With specific reference to the training example in Table 25.7, we would require for con- 
structing 95 percent family confidence coefficient intervals for all pairwise comparisons 
between training methods: 


Es 
JA 


Confidence Intervals for Marginal Means. Ап exact confidence interval for a marginal 
mean ш. in mixed ANOVA model (25.42) cannot be obtained because the variance of the 
marginal mean fi. in (25.53) is not a multiple of a single expected mean square. Rather, 
the variance is a linear combination of two expected mean squares, as follows: 


q(.95;4, 12) 2420 Т = — (4.20) = 2.97 


o? (fu.) = cı E{MSAB} + coE{MSB} (25.63) 
where: 
Qs (25.63a) 
nab 
со = zx (25.63b) 
nab 


An unbiased estimator of o? (fi;.) is: 
s^(fu.) = cu MSAB + c;MSB (25.64) 


Since the form of the variance of estimated marginal mean ĝ;. is that in (25.25), the 
Satterthwaite approximation can be employed, where the degrees of freedom associated 
with the estimated variance s?((1;.) are according to (25.28): 


a—1 1 2 
MSAB + —- MSB 
nab nab 


a— lMSAB 1 SB f 
nab nab 


(a—1)(b—1) b—1 


df = (25.65) 


Approximate 1 — о confidence limits for 14. therefore are: 
fu. X t(1 — 0/2; df)s{iti-} (25.66) 
where 52(0;.} is given in (25.64) and df is given in (25.65). 


1060 Раг Five Multi-Factor Studies 


Referring again to the training example of Table 25.7,a95 percent confidence inter 
is desired. As noted previously, the estimated mean improvement for trainine 
= 


Example 


val for д. 
method 1 is. 
fu. = Yi.. = 43.1 


Using (25.64) and noting that nab = 4(4)5 = 80, we obtain: 
УА.) = 3 3.9) + | (13.5) = 315 
80 80 
or s{fiy.} = .561. From (25.65) we find: 


3 1 
df — = И 


3 ? l ? 
= {3 LER: 
EX J Fags s)| 
3(4) = 4 


Using df = 11. the required г percentile is ¢(.975; 11) = 2.201. The confidence limits are 
therefore 43.1 + 2.201(.561) and the desired confidence interval is: 


3 


41.9 < ш. < 44.3 


We conclude with approximate confidence coefficient .95 that the mean improvement for 
training method | averaged over all instructors is between 41.9 and 44.3. 


25.5 Randomized Complete Block Design: Random Block Effects 


In our discussion of randomized complete block designs in Chapter 21. we assumed that 
block effects were fixed. However, when blocks are a random sample from a population, 
the block effects in the randomized complete block design model should be considered to 
be random variables. as in the following two examples. 


1. A researcher investigated the improvement in learning in third-grade classes by aug- 
menting the teacher with one or two teaching assistants. Ten schools were selected at random, 
and three third-grade classes in each school were utilized in the study. In each school, one 
class was randomly chosen to have no teaching assistant, one class was randomly chosen 
to have one teaching assistant, and the third class was assigned two teaching assistants. The 
amount of learning by the class at the end of the school year, suitably measured, was the 
response variable. Here the blocks are schools, which may be viewed as a random sample 
from the population of all schools eligible for the study. 

2. Ina study of the effectiveness of four different dosages of a drug, 20 litters of mice, 
each consisting of four mice, were utilized. The 20 litters (blocks) here may be viewed as 
arandom sample from the population of all litters that could have been used for the study. 


When blocks are considered to be a random sample from a population of blocks, either 
an additive (i.e., no-interaction) or a nonadditive (i.e., interaction) model can be employed. 
The choice can be assisted by the diagnostics discussed in Section 21.4. In particular, plots 
of the responses Y;; for each block. such as in Figure 21.2, can be helpful in examining 


ES 


^ Mdditive Model 


D 


iw мү t 


Chapter 25 Random and Mixed Effects Models 1061 


whether blocks and treatments interact. A severe lack of parallelism in such a plot would 
be a clear indication that the interaction model may be preferable. The Tukey test statistic 
for interactions in (20.11) may also be utilized, with the interpretation here that the test 
applies to the given blocks that have been selected. Finally, the nature of the correlations 
between the experimental units within a block may be examined because the two models 
make different assumptions about these correlations. 

When the primary emphasis of the analysis is on testing and estimating treatment effects, 
which is the usual case, the choice between the two models actually is not critical because 
the inference procedures for fixed treatment effects, as we shall see, are exactly the same 
for the two models. 

We first explain the additive, no-interaction model for randomized block designs with 
fixed treatment effects and random block effects, and then we will take up the interaction 
model. Both of these models are special cases of two-factor mixed model (25.42). We 
shall repeat the principal results here because the notation for randomized block designs is 
slightly different. 


Comment 


A special case of random blocks occurs when the blocks are experimental units such as persons, 
stores, or cities, where each receives all of the treatments over time or where the effect of a given 
treatment (е.р., advertising) is evaluated at different points of time, These repeated measures designs 
are discussed in Chapter 27. " 


The additive model for random block effects and fixed treatment effects is a special case 
of mixed two-factor model (25.42), with n — 1, the interaction term dropped, and fixed 
factor A effects now being the treatment effects denoted by v; and random factor B effects 
now being the block effects denoted by pj: 


Yi; = Me. + pi t; + Ej (25.67) 
where: 
р... is a constant 
p; are independent N (0, о) 
t; are constants subject to the restriction У) т; = 0 
£i; are independent N (0, о?), and independent of the p; 
cl ung == зл 


Properties of Model. The important properties of mixed two-factor model (25.42) were 
given in (25.44)-(25.46). These properties for randomized complete block design model 
(25.67) are: 


E(Yi] = ш. +T (25.68a) 
сҮ} = o? = o? о? (25.68b) 
c(Yg,Y)]-o?  jzj (25.68c) 


o{¥i;, Үз} = 0 izi' (25.68d) 


1062 Part Five 


Multi-Factor Studies 


Thus, the variance of Y;;. again denoted by o;.. is a constant for all Observations; any гу 
Observations from different blocks are independent; and any two Observations from iie 
same block are correlated for this model. Note that the covariance for any two Observationg 
from the same block must be positive in advance of the random trials and that the covariance 
is the same for all blocks. A positive covariance is reasonable for many applications, For 
example, class learning in different classes in the same school will tend to be more similar 
than for classes in different schools because of similar facilities, similar quality of teachers 
and the like. d 

The coefficient of correlation between any two observations from the same block for 
model (25.67) is constant for all blocks and will be denoted by c: 


бүбү (25.69) 
This follows from the definition of a coefficient of correlation in (2.76) and the fact that 
c {Y} = o(Yiy) = oy. Note also that the covariance in (25.68c) can be expressed as follows 
using (25.69): 
kl . . 
AN, Vir} =w;  jzj (25.70) 
Covariance Structure of Observations. Since any two Y;; observations within a given 
block in advance of the random trials are correlated in the same fashion, the variance- 
covariance matrix of the observations in a given block is of a particular form. We illustrate 
this variance-covariance matrix for the observations in a block for a randomized block study 
with r — 3 treatments, using the covariance expression in (25.70): 


2 2 2 | 
Oy шор Woy о w 
2 2 2 2 2 
c'[Y]— | оор of ооу | =оу|о 1 o (25.71) 
2 2 2 | 
оор Woy Oy о o 
where: 
Yi 
Y= | Yn 
Yi 


Note that the main diagonal of the matrix contains the constant variance of the У,у, oy, and 
the entries off the main diagonal are the constant covariances, wo}. The particular pattern 
of the variance-covariance matrix in (25.71) is called compound symmetry. 

While any two observations in a given block are correlated in advance of the random 
trials, once a block has been selected, additive model (25.67) assumes that the observations 
in that block are independent. The only remaining random variation in an observation Yi; 
then is the error term £;;, and additive model (25.67) assumes that these аге independent. 
Thus, in the teacher assistant study, model (25.67) assumes that once the schools have been 
selected, any one class performance is independent of that of another class in each selected 
school, given all of the common conditions for the classes in that school as reflected in the 
block effect p;. 


= Chapter 25 Random and Mixed Effects Models 1063 
Comments 
1. The variance of Y;; in (25.68b) can be expressed as follows using (25.69): 
оў = ос? + o? 


Hence, we obtain: 


c? 


Zu 
Oy = 


(25.72) 
1—o 

2. The assumption of compound symmetry in additive model (25.67) is restrictive. While this 
assumption is sufficient so that the F* statistic for testing treatment effects will follow the F distribution 
when Но holds (i.e., when no treatment effects are present), the assumption is not necessary. For this 
purpose, it suffices that the condition of sphericity be met. This condition requires that the variance 
of the difference between any two estimated treatment means be constant; that is: 


ӨЧҮ, — Үр) — constant jzj (25.73) 


This condition can be met without the compound symmetry requirement. For example, consider 
the following variance-covariance matrix for the Y;; observations in any block for a randomized 
complete block study with r — 3 treatments: 


224 
o{Y}=|2 4 5 
458 


This matrix does not exhibit compound symmetry. Yet the requirement for sphericity in (25.73) is 
met because o?{¥; — Y) = 2/n, always. For example, we have: 


np | | 


Analysis of Variance. Table 25.8 contains the analysis of variance for additive model 
(25.67). The sums of squares are the same as in (21.6) for the fixed effects model. 
Table 25.8 also contains the expected mean squares for model (25.67). The expected mean 


TABLE 25.8 ANOVA for Randomized Complete Block Design—Block Effects Random, 
Treatment Effects Fixed. 


Fi 


E{MS} 
1 Additive Model Interaction Model 
$$ df MS (25.67) (25.74) 
SSBL np — 1 MSBL c? + ro; c? + ro; 


uj uj 


2 2 
г 1 diui cL mr 
SSBL.TR (пь — 1)(7 — 1) MSBL.TR c? о? + оў, 
SSTO ng — 1: 


SSTR r-i MSTR c? + np 


1064 Part Five  Multi-Factor Studies 


squares correspond to those in Table 25.5 for the mixed two-factor model, With n 


no interaction effects, and change of notation associated with fixed factor A being ir 1, 
ments and random factor В being blocks. The statistic for testing for tre Cat- 


atment effects i 
Is 
e 2548, Thus, the 


Vals for treatment 
as the mean square in 


F* = MSTR/MSBL.TR, as may be seen from the E{MS} column in Tabl 
test statistic is the same as when block effects are fixed. Confidence inter 
contrasts also present no new issues. Again, MSBL.TR will be used 
the estimated variance of the contrast. 


Interaction Model 


When blocks are a random sample from a population of blocks, the presence of interactions 
between blocks and treatments can be accommodated by a model including these interaction 


effects: 
Yi; = и. + pj + tj + (от) + Ei (25.74) 
where: 
p.. is a constant 
p; are independent N (0, 07) 
т; are constants subject to the restriction Ут; = 0 


-l 
(от) are n(o. ez). subject to the restrictions: 
Dt) =0 for all i 


l 2 ЯР 
o ((ox)i;, (от = Ж: for j AJ 


pt 
(pt); are independent of the p, 


£;; are independent N(0, o?) and independent of the p; and of the (от); 
i-—]l,....npsj—d...r 


This model is a special case of mixed two-factor model (25.42), with n — 1 and with some 


changes in notation to recognize that fixed factor А now is treatments and random factor B 
now is blocks. 


Properties of Model. The properties of interaction model (25.74) are obtained directly 
from those in (25.44)-(25.46) for the mixed two-factor model: 


E{¥ij} = ш. +; (25.75a) 
— 1 
o {Y} = ор = 05 + 02 To? (25.75b) 
r + 
1 
о{0,, Yi] = 0; — б jj (25.750) 
e[Y;, Yr} =O фі (25.750) 


Note again that the Y;; have constant variance, that observations from different blocks are 
assumed to be independent, and that any two observations Y;; and Y;; from the same block 
are correlated, the covariance being the same for all blocks. Unlike for additive model 


жый nh a 


Dr 


Chapter 25 Random and Mixed Effects Models 1065 


(25.67), the covariance between any two observations from the same block can be negative 
or positive for interaction model (25.74). 

The coefficient of correlation between any two observations in the same block, denoted 
by @*, is: 


1 
Т а? = n7 
аб = м Aud (25.76) 
Oy 


Interaction model (25.74) assumes, just like additive mode] (25.67), that, once the blocks 
have been selected, any two observations from a given block are uncorrelated. 


Analysis of Variance. The sums of squares and degrees of freedom for interaction model 
(25.74) are the same as those for additive model (25.67). The principal difference in the 
use of the two models occurs because of the difference in the expected mean squares, as 
shown in Table 25.8. No exact test for block effects is possible with the interaction model, 
whereas an exact test is possible with the additive model. This distinction is unimportant 
whenever blocks are used primarily to reduce the experimental error variability and are not 
of intrinsic interest themselves. 

The F* test statistic for treatment effects is the same for the two models, namely 
F* — MSTR/MSBL.TR, which is exactly the same as test statistic (21.7b) for random- 
ized block model (21.1) with fixed block effects. Similarly, estimation of fixed treatment 
effects for both models with random block effects is carried out in the manner described in 
Section 21.3 for fixed block effects. 


Comments 


1. Table 25.8 indicates that when the block effects are random, MSBL.TR estimates c? for additive 
model (25.67). For interaction model (25.74), however, MSBL.TR estimates the sum of the error term 
variance c? and the interaction variance оў . Separate estimation of these two components is not 
possible for this latter model, and the two components are said to be confounded. This problem can 
be avoided by utilizing replication within blocks described in Section 21.7. 


2. When the assumption of compound symmetry, which underlies both additive model (25.67) 
and interaction model (25.74), and the less restrictive requirement of sphericity are not met, the usual 
F test becomes biased. Some computer packages provide the user with the option of formally testing 
for compound symmetry or sphericity. 

When these conditions are violated, an approximate conservative test procedure is as follows: 

а. Conduct the usual F test; if it leads to conclusion Ho, accept this conclusion. 

b. If the usual F test leads to conclusion H,, replace F[1 — о; г — 1, (n) — I(r — 1)] in de- 
cision rule (21.7c) by F(1 — 0; 1, пь — 1). If this modified decision rule leads to H,, accept this 
conclusion. 

c. If the modified decision rule leads to Ho, revise the degrees of freedom in the modified decision 
rule by one of the epsilon adjustment procedures, as described in References 25.8 and 25.9. 

Alternatively, multivariate analysis of variance techniques may be employed provided that n, > r. 
See Reference 25.10 for further discussions of these issues. 

3. Mixed models based on less restrictive assumptions regarding the variance-covariance matrix 
andthe parameters in the ANOVA model have also been proposed. See Reference 25.7 for a discussion 
of these models. [| 


1066 Part Five 9 Multi-Factor Studies 


25.6 "Three-Factor Studies—ANOVA Models Il and IH 

Just as for single-factor and two-factor studies, the analysis of variance Sums of Square. 

and degrees of freedom for random and mixed multi-factor models are the Same as Шы 
for the corresponding fixed ANOVA model. The principal issue with random and mixed 
multi-factor models, as we saw for two-factor models, is the determination of the expected 
mean squares. Once these are known, the proper test statistics and confidence intervals can 
be constructed. Rules for finding expected mean squares for random and mixed models 
are given in Appendix D for balanced studies with any number of factors. We now present 
model II (random factor levels) and model 111 (mixed factor levels) for three-factor studies 
and show how appropriate tests are conducted. We consider again the balanced case where 
all treatment sample sizes are equal. 


ANOVA Model II—Random Factor Effects 


Ina study of the effects of operators, machines, and batches of raw material on daily output, 
all three factors may be considered to have random factor levels. The random ANOVA 
model for such a three-factor study is: 


Үги = HM- + 0 + В; ү + (08); + Cry us (Ву) + (COBY )ijk + ели (25.77) 


where: 


ji... IS a constant 
Qis Bj, Yes СОЭ. (OV Jir (BY) i. (COBY ijk» &цкш are independent normal random 
variables with expectations zero and respective variances 07, 07, o, о 


2 
Е Оуу» 
E 
о? 


2 
By? Tapy? 
pm as à; j —]..... bik =1..... сут = 1....,п 


Just as for two-factor random ANOVA model (25.39), the responses # jx. for three- 
factor random ANOVA model (25.77) are normally distributed with constant variance, The 
expected value and variance of response Ук are: 


7 E{Yijku} = џи... (25.78a) 
o’ [Yin] = оу = о, 05 +0, + сар + Og, + ор, + сару Бо? (25.78b) 


Any two responses are independent except when they have one or more common factor 
levels; these latter are correlated because they contain some common random terms. , 

Table 25.9 contains the degrees of freedom and the expected mean squares for all com- 
ponents of the ANOVA table for random ANOVA model (25.77). 


ANOVA Model III —Mixed Factor Effects 


Consider a three-factor study where factors B and C have random factor levels while factor A 
has fixed factor levels. The restricted mixed ANOVA model for such a three-factor balanced 
study is: 


ка = HB + Gi + B; + ук + (08): + (ry) + (BY) i + (By i + Eijku (25.79) 


Chapter 25 Random and Mixed Effects Models 1067 


+ 


Кы Mean Square df Expected Mean Square 
$c for MSA а—1 a? + пёсв + nco2, + nbo, + пед, 
i MSB b-1 c? + пасо? + nco2, + паса, + nods, 
MSC c—1 c? + nabo? + прод, + naa, + под, 
MSAB (a—1)(b—1) о? + nco2, + под, 
MSAC (a—1)(c—1) o? + прог, + под, 
1 MSBC (b— 1)(c — 1) с? + nac, + по2,, 
MSABC (a — 1Yb - 1)(c - 1) c? + по2,, 
MSE (п— 1)abc с? 
2 where: 


LL... is a constant 
о; are constants 


Bi. Vex (е). (оу). (BY) jx» (бу) are pairwise independent normal random 
variables with expectations zero and constant variances 


$ Eijkm are independent № (0, с?), and are independent of the other random components 


уно = > (08); = enix = У, CBY ix = 0 


i=l,...,aj=l,...,.b:k=1,...,c;m=1,...,n 


Note that all interaction terms in this model are random, since at least one of the factors 
contained in each has random factor levels. Note also that all sums of effects over the fixed 
factor levels are zero. Various correlations exist between the random effects terms, which 
we shall not detail. 

The responses Ук for three-factor mixed ANOVA model (25.79) are normally dis- 
tributed with constant variance. The expected value of observation Yj jin is: 


E{Y ijxm} = p... + i (25.80) 


In advance of the random trials, any two responses are independent except for those that 

contain common and/or correlated random effects terms; these observations are correlated. 
Table 25.10 contains all the expected mean squares for mixed ANOVA model (25.79). 
Other mixed ANOVA models can be developed in similar fashion. The expected mean 

squares for these mixed models can be found by employing the rules presented in Appendix D 


Appropriate Test Statistics 


From the expected mean squares, we seek to determine the appropriate F* statistic for a 
given test. An exact test statistic can often be found for random and mixed multi-factor 
models, but not always. 


Exact F Test. Suppose we wish to determine whether or not BC interactions are presentin 
random ANOVA model (25.77). We see from Table 25.9 that the appropriate test statistic is 
MSBC/MSABC. If we wish to study the same question for mixed ANOVA model (25.79), 


1068 Part Five 


TABLE 25.10 
Expected Mean 
Squares for 
Balanced 
Mixed 
Three-Factor 
ANOVA Model 
(25.79) 

(A fixed, B and 
C random). 


Multi-Factor Studies 


Mean Square df Expected Mean Square 
Уо? 
MSA а—1 а? + nbc — + nca, + nbo2, + ng? 
MSB b—1 c? + пасор + пас), | 
MSC с—1 c? + nabo? + nao?, | 
MSAB (a— 1)(b— 1) о? + ncaz, + По, 
MSAC (a— 1)(c — 1) c? + прод, + под, 
MSBC (b— 1)(c — 1) c? + naog,, 
MSABC (a — 1)(6— 1)(с— 1) c? + noh, 
MSE (n — 1)abc o? 


we see from Table 25.10 that an appropriate test statistic is available, but this time it is 
MSBC/MSE. We thus note that the two test statistics are not the same, even though the 
same factor effects are being studied, because of the differences between the two models. 

It is not always possible to find an exact F test for mixed and random multi-factor 
ANOVA models. For instance, we cannot directly test for the presence of factor A main 
effects in random ANOVA model (25.77). Note from Table 25.9 that there is no expected 
mean square that consists of the components of E {MSA} except for the nbco? term. 

Sometimes it is possible to assume that certain interactions are zero, and then proceed in 
the usual way with an exact F test. For example, to test for factor А main effects in random 
ANOVA model (25.77) (see Table 25.9), it may be possible to assume that p = 0 (indeed, 
this can be tested with MSAC/MSABC). If this assumption is appropriate, we can use the 
test statistic MSA/MSAB to test for factor A main effects. 


Satterthwaite Approximate F Test. Often. it is not known whether certain interactions 
are zero. In that case, an approximate F test may be employed that utilizes a pseudo 
F or quasi F test statistic. This approximate test, called the Satterthwaite test, involves 
developing a linear combination of mean squares that has the same expectation when Ho 
holds as the fàctor effects mean square. As noted in our discussion of the Satterthwaite 
procedure for constructing approximate confidence limits for variance components, this 
linear combination is expressed in the form: 


^ 


L=cMS, TH сМ, Ы 


where the c; are constants. The approximate number of degrees of freedom associated with 
this linear combination of mean squares is given by (25.28). The test statistic is then set up 
in the usual way and follows approximately the F distribution when Hp holds. 
We illustrate this procedure for testing factor A main effects in random ANOVA model 
(25.77): 
ud cs 
СЕУ (25.81) 
Н: о? > 0 


а 


Chapter 25 Random and Mixed Effects Models 1069 


Source of 

Variation SS df MS 
Factor A. (operators) 17.3 2 8.65 
Factor B (machines) 4.2 1 4.20 
Factor C (batches) 24.8 4 6.20 
AB interactions 4.8 2 2.40 
AC interactions 31:7 8 3.96 
BC interactions 12.5 4 3.13 
ABC interactions 11.9 8 1.49 
Error 1377 60 2.30 
Total 244.9 89 

Note from Table 25.9 that: 
E{MSAB} + E{MSAC} — E{MSABC} = о? + псо2, + nboz, -- no, (25.82) 


This equals precisely E{MSA} when o2 = 0. Hence, the suggested test statistic is: 
MSA 

~ MSAB + MSAC — MSABC 

where we denote the test statistic as F** as a reminder that a pseudo F test is involved. 


Е** 


(25.83) 


Table 25.11 contains the analysis of variance for a study ofthe effects of operators, machines, 
and batches on the daily output of a highly automated process. Each factor is assumed to 
be a random factor. To test whether operators (factor A) have a main effect on output, we 
use test statistic (25.83): 

8.65 . 8.65 
~ 240--3.96 —149 4.87 
The approximate number of degrees of freedom associated with the denominator 1s, from 
(25.28): 


Е** 


= 1.78 


2 
2.40)? (3:96). (-1.49)? 
2 8 i 8 


which we round to 5. For level of significance o — .05, we require F(.95; 2, 5) — 5.79. 
Since F** = 1.78 < 5.79, we conclude Но, that operators do not have a main effect on daily 
output. 


Comment 
Since the Satterthwaite pseudo F test is an approximate one, it must be employed with caution. Some 
alternative procedures are provided in References 25.2 and 25.11. [| 


Estimation of Effects 


No new problems arise in the estimation of variance components for random factors or in 
the estimation of contrasts for fixed factors in mixed models, when three or more factors 


1070 Part Five 


Multi-Factor Studies 


are studied at one time. Confidence limits for contrasts of the factor level Means of a fixe; 
factor are obtained by using the mean square utilized in the denominator of the test oo 
for examining the presence of main effects for that factor. The degrees of freedom, P us 
associated with the mean square utilized. se 


25.7 ANOVA Models H and Ш with Unequal Sample Sizes 


Example 


We noted in Chapter 23 for the fixed two-factor ANOVA model that unequal treatment 
sample sizes make the analysis of variance more complicated because the sums of squares 
no longer are orthogonal. Tests of hypotheses must then be based on the general linear test 
approach. When sample sizes are unequal for studies involving random etfects, the level 
of complexity increases in a similar fashion. Most of the methods described thus far for 
two-factor and multi-factor ANOVA models II and III do not apply to unbalanced studies. 
For example, in unbalanced studies typically neither exact nor Satterthwaite approximate 
F tests exist. 

A number of alternative approaches have been developed for making inferences for 
ANOVA models II and III in the presence of unequal sample sizes. We shall discuss an 
approach based on the method of maximum likelihood. This approach has the advantage 
of conceptual simplicity and is a general procedure that possesses a number of optimality 
properties. Detailed discussions of this and alternative approaches can be found in Refer- 
ences 25.2, 25.7, and 25.12. 

We shall illustrate the maximum likelihood approach using an example involving a two- 
factor experiment where one factor has fixed factor levels and the second factor has random 
factor levels. 


The Sheffield Foods Company markets a variety of dairy products, including milk, ice cream, 
and yogurt. Recently, the company received a complaint from a government agency that the 
actual levels of milkfat in its yogurt exceeded the labeled amount. Company personnel were 
concerned that the government's laboratory method for measuring fat content in yogurt 
might be unreliable because it is primarily designed for use with milk and ice cream. 
То study the reliability of Sheffield's and the government's laboratory methods, a small 
interlaboratory study was carried out. Four tesung laboratories were randomly selected from 
the population of laboratories in the United States. Each laboratory was sent 12 samples 
of yogurt, with instructions to evaluate six of the samples using the government's method 
and six by the company's method. The yogurt had been mixed under carefully controlled 
conditions and the fat content of each sample was known to be 3.0 percent. 

In this study, measurement method is a fixed factor with a = 2 levels (i = 1: Govern- 
ment method; i = 2: Sheffield method) and laboratories is a random factor with b = 4 
levels. Because of technical difficulties with the Government method, none of the labo- 
ratories was able to obtain fat content determinations for all of the six samples assigned 
to that method in the time available. The results of the study are given in Table 25.12. 
Figure 25.3 contains dot plots of the data. The variability of the sample fat determina 
tions appears to be reasonably constant for all measurement method—laboratory combi- 
nations. Figure 25.4 contains a MINITAB estimated treatment means plot. For the four 
laboratories included in the study, no major interaction effects between laboratory and 
measurement method on fat content determination appear to be present. The plot suggest 


Chapter 25 Random and Mixed Effects Models 1071 


Measurement. EFE, oss NS 
Method k j^1 і= 2 j=3 j=4 
i=1 1 5.19 4.09 4.62 3.71 
Government 2 5.09 3.99 4.32 3.86 
3 3.75 4.35 3.79 
4 4.04. : 4.59 ü 3.63 
5 4.06 
E i=2 1 3.26 3.02 3.08 2.98 
: Sheffield - 2 3.48 3.32 2.95 2.89 
i i 3 3.24 2.83 2.98 2.75 
4 3.41 2.96 2.74 3.04 
5 3.35 3.23 3.07 2.88 
6 3.04 3.07 2.70 3.20 
FIGURE 25.3 Method 1 Method 2 
pot Plots of Fat Government Sheffield 
Content 4 € 
+ éterminations 
iby Laborato R 
D. nun llt .. Ё 
m hd 
‘Measurement 5 E 
| ыМеһоа— § 2 EC 5 
M Sheffield Foods 
5 ‘Company 1 ee 
‘Example. 
F Р i eed ETE 1 an J 
at 2.5 3.5 4.5 5.5 2.5 3.5 45 5.5 
7 Percent Fat Percent Fat 
‘FIGURE 25.4 Interaction Plot—Means for PCTFAT 
‘Estimated 
"Means =æ- Method 2 
Plot—Sheffield 
Foods c 
Company © 
Example. 24 


LAB 


1072 PartFive  Multi-Factor Studies 


a definite measurement method effect and possibly also some differences between 
ratories. We shall now analyze the data formally by means of the m 
approach. 


labo. 
aximum likelihoog 


Maximum Likelihood Approach 
The maximum likelihood approach that we will utilize for the Sheffield Foods Compa 
example makes somewhat stronger assumptions than mixed ANOVA model (25.42), which 
we would use if the study were balanced. We first review mixed ANOVA model (25.42) as 
it applies to the Sheffield Foods Company example. 


Mixed ANOVA Model (25.42). This model for the Sheffield Foods Company example 
is as follows: 


Үр = p. + 0i + Bj + (08); + Eijk (25.84) 
o7 {Bj} = оў 
2—1 а2, 
o?^((o);;) = ep = ES 


о} = о? 
ё=—1,2,]= l,...,4& kc l....nij 


For this model, the expected value and variance of Y;;, are according to (25.44) and (25.45): 


E(Yi] = ш. + ой (25.85) 
o 
o^ Y) = ор = 0$ 2 o (25.86) 
Also, the responses Y;;, are correlated as follows according to (25.46): 

о? 
o Yi, Yije} = 05 + = КФК (25.87а) 

о? 
(Ил, ек) =o- й (25.870) 
o {Yii Yr pp) = 0 jzj (25.870) 
We also know that the responses У, for mixed ANOVA model (25.42) are normally 


distributed. 

Since the expected value of Y;;, depends only on the fixed effects џг.. and o; (the random 
effects have expectations zero), we can represent the vector of expected values, E{Y}, in the 
matrix form ХВ. We illustrate this in Table 25.13 for the Sheffield Foods Company example. 
This table contains the vector of responses Y, the vector of parameters f, and the X matrix 
containing the usual column of 15 associated with u.. and an indicator variable taking on 
the values ] and — І associated with o. Recall that o; = —a, since » a; = 0. 

The variance-covariance matrix of the responses Y;;j,, o^(Y], has on the main diago- 
nal the constant variance from (25.86) and off the main diagonal the covariances from 
(25.87). We illustrated such a variance-covariance matrix in Table 25.4 for a study in which 
a=b=n=2. 


Chapter 25 Random and Mixed Effects Models 1073 


Yin 1 1 
Yu2 1 1 
Yiz 1 1 
Y122 1 1 
= 7 = _ {i 1 dne 
is Үл | xm 1-1 zi 
Үл2 1—1 
Үз 1—1 
Yo 1-1 
Yo46 1-1 


Density Function. To employ the method of maximum likelihood, we make a somewhat 
stronger assumption than with ANOVA model (25.42). We assume all of the properties 
of model (25.42) and in addition assume that the У, are jointly normally distributed. 
The density function of the multivariate normal distribution is given in (5.50). The mean 
vector p in (5.50) corresponds here to XB, and the variance-covariance matrix X in (5.50) 
corresponds to o?(Y). We shall continue to use X to represent the variance-covariance 
matrix of the responses Y;;,. The number of Y variables p in (5.50) corresponds here to пт. 
We can then express the joint density function of the responses Y; as follows: 


KY) = exp |- X —Xpyx (v-— xp) (25.88) 


1 
Qype 
Viewing the joint density as a function of the unknown parameters (for the Sheffield Foods 
Company example, у... and œ; in В and c?, of, and o7, in X), given the observations У,у, 
the function in (25.88) is called the likelihood function and denoted by L. 


Maximum Likelihood Estimates. To obtain the maximum likelihood estimates of the 
unknown parameters, it is easiest to work with the logarithm of the likelihood function: 


1 
log, L = ы log, 27) — 7 log, |E| — 5 — ХВ)У- (Үү — Xp) (25.89) 


The maximum likelihood estimates of u.., ор, o?, оў, апа сёз for the Sheffield Foods 
Company example are those values of these parameters that maximize the log-likelihood 
function in (25.89), subject to the constraints that the variance components are nonnegative. 
For unbalanced studies, numerical search procedures are generally required to obtain the 
maximum likelihood estimates. We shall rely on standard statistical software programs to 
carry out the numerical search procedures. 


Inference Procedures. Inference procedures are analogous to those explained in Chap- 
ter 14 for maximum likelihood estimation of the regression parameters in logistic regres- 
sion. The estimated approximate variance-covariance matrix of the estimated parameters 


1074 Part Five 


Multi-Factor Studies 


is obtained through the Hessian matrix in (14.50), which contains the second-order par- 
tial derivatives of the logarithm of the likelihood function with respect to the Parameters 
This estimated variance-covariance matrix is usually provided by a statistical package in 
conjunction with the numerical search for maximum likelihood estimates, 

Large-sample inference procedures are described in Chapter 14. In the Sheffield Foods 
Company example, for instance, the following approximate result for estimating the fixed 
laboratory method effect о is obtained from (14.52): 


^. 


Ор — 01 2, 

sid; z (25.90) 
An approximate confidence interval for o, or a test concerning o, can then be developed 
readily. Simultaneous estimation of several parameters can be done as usual by means of the 
Bonferroni procedure. Tests whether several parameters equal zero (e.g., оў = 075 = 0) 
are carried out by fitting the full and reduced models and obtaining the likelihood ratio test 
statistic (14.60). This test should not be used if any of the estimated variance components 
equals zero. 

Often, there is interest in a linear combination of the parameters. For instance, the 
marginal mean дд. may be of interest in the Sheffield Foods Company example. Since 
кл. = ш.. + оу, the maximum likelihood estimator of this quantity is the following linear 
combination of the estimated parameters: 


fi =й.+б=(1 1000] (25.91) 


Denoting the row vector of coefficients by c', we use (5.46) to obtain the estimated variance 
of n [TH 


s?^(fu.) = св (b)c (25.92) 


where s?{b} is the estimated approximate variance-covariance matrix of the parameter 
estimates. Large-sample inferences are then conducted in the usual manner, utilizing the 
standard normal distribution. 

We caution again that the inference procedures discussed here require large sample sizes. 
In studies with random factor levels, the number of factor levels frequently is not large. For 
instance, in the Sheffield Foods Company example only four laboratories were employed 
in the study. Use of a much larger number of laboratories would have been much tgo costly. 
An estimate of interlaboratory variability based on four randomly selected laboratories is 
likely not to be precise and use of a large-sample approximation for obtaining an interval 
estimate may not be appropriate. 

Bootstrapping, as explained in Chapter 1 1, may be used to examine the appropriateness of 
large-sample inference procedures for maximum likelihood estimates in unbalanced studies. 
However, in some cases bootstrapping for variance components may not perform properly, 
which could be an indication that large-sample inference procedures are not appropriate. 


example 


TABLE 25.14 
Maximum 
Likelihood 
Estimates and 
Estimated 
Variance- 
Covariance 
Matrix— 
Sheffield Foods 
Company 
Example. 


Chapter 25 Random and Mixed Effects Models 1075 


In the Sheffield Foods Company example, the investigators were primarily interested in 
determining whether the two different measurement methods yield systematic differences 
in the determination of fat content. The BMDP3V computer package was used, together 
with transformations (25.43) to go from the unrestricted to the restricted model, to obtain 
the maximum likelihood estimates of the parameters in the log-likelihood function (25.89) 
for the mixed ANOVA model. Table 25.14a contains the maximum likelihood estimates 
of the parameters and the estimated approximate standard deviations of these estimates. 
Table 25.14b contains the estimated approximate variance-covariance matrix of the maxi- 
mum likelihood estimates obtained through the Hessian matrix in (14.50). 

Since the sample sizes are not large here, bootstrapping was employed to examine 
whether the large-sample inference procedures for maximum likelihood estimates described 
in Chapter 14 are appropriate. Five hundred bootstrap samples were generated, the maxi- 
mum likelihood estimates were obtained for each using SAS PROC MIXED, and a boot- 
strap distribution of the parameter estimates was created for each parameter. Table 25.15 
contains the means and standard deviations of these bootstrap distributions, together with 
the maximum likelihood estimates and the approximate standard deviations repeated from 
Table 25. 14a. 

Before examining whether the two measurement methods differ in their fat content 
determinations, we need to consider whether measurement method-laboratory interactions 
are present. The large-sample test statistic (14.52) for testing Ho: ou, = 0 is, using the 
results in Table 25. 14a, z* = .086/.064 = 1.34. This small value of the test statistic supports 
Ho, that there are no interaction effects. However, the bootstrap distribution of 82, is highly 


(ay Estimated Parameters and Standard Deviations 


Estimated 
Estimated Standard 
Parameter Parameter Deviation 
[m 3.694 .158 
ол .633 .107 
c? .023 .006 
оў „097 „071 
о2, .086 .064 


(b) Estimated Approximate Variance-Covariance Matrix 


f. | à 6? ё? 62, 

й.. [0250 .0002 .0000 .0000 .0000 

à, |.0002 .0114  .0000  .0000 .0000 
5406) = 6? |.0000 .0000  .0000  .0000 —.0000 
ó2 |.0000 .0000  .0000  .0050 —.0001 

62, |.0000 .0000 —.0000 —.0001 .0041 


1076 


Part Five  Multi-Factor Studies 


TABLE 25.15 Means and Standard Deviations of Bootstrap Distributions and 
Maximum Likelihood Estimates—Sheffield Foods Company Example. 


Standard Deviation 


Maximum LIU уы Д 
Bootstrap Likelihood Maximum 
Parameter Mean Estimate Bootstrap Likelihood 
и. 3.69 3.69 .157 .158 
ол .637 .633 .110 107 
ү оў .092 .097 .128 .071 
вд, .078 .086 .190 .064 
о? .023 .023 .006 .006 
FIGURE 25.5 0.20 
Bootstrap 
Distribution for 
&j—Sheffield 0.15 
Foods © 
Company S 
Example. 9 0.10 
LL 
E 
0.05 
0.30 0.75 1.20 
Bootstrap ài 


skewed, with a large concentration at zero. Furthermore, the bootstrap standard deviation 
according to Table 25.15, s*(62,) = .190, is much larger than the large-sample estimate. 


«В 


Thus, use of large-sample inference procedures may not be appropriate here. Nevertheless, 
the bootstrap results are consistent with the large-sample results, suggesting even more 
strongly that there are no interaction effects between measurement methods and laboratories. 

We therefore examine next the measurement method main effects. The bootstrap distri- 
bution for & is shown in Figure 25.5. It is approximately normal. Also, Table 25.15 shows 
that the bootstrap standard deviation for @, and the large-sample standard deviation are very 
similar. These findings support the use of large-sample inference procedures for ол. Hence, 
we use the large-sample confidence interval in (14.54) to estimate o; — o = 201. Fora 


95 percent confidence interval, we require: 


z(.975) = 1.960 | 2à, = 2(.633) = 1.266 | 2s(à,) = 2(.107) = 214 
The confidence limits therefore аге 1.266 + 1.960.214) and the approximate 95 percent 


confidence interval is: 


.85 < o, — @ < 1.69 


жч dede „Ин ш 


аж ыб те 


Chapter 25 Random and Mixed Effects Models 1077 


We conclude, with approximate confidence coefficient .95, that the mean government 
method fat determination is between .85 and 1.69 percent points higher than that for the 
Sheffield method. Since the true fat content in the samples was 3 percent, Figure 25.4 indi- 
cates that the government method is biased upward and that the Sheffield method is more 
accurate, 


Comment 


Mixed effects models are sometimes estimated by means of restricted maximum likelihood (REML). 
Using this approach, the variance-covariance components are estimated via maximum likelihood 
(ML) averaging over all possible values of the fixed effects. The fixed effects are estimated using 
generalized least squares given their variance-covariance estimates. Under full maximum likelihood, 
the variance-covariance parameters and the fixed effects are estimated by maximizing their joint 
likelihood. The fixed effect estimates using REML generally exhibit less bias than ML estimates 
whereas both REML and ML variance component estimates are identical. See Reference 25.7 for 
further details of these estimation methods. E 


"Cited 
References 


25.1. Searle, S. R., G. Casella, and C. E. McCulloch. Variance Components. New York: John Wiley & 
Sons, 1992. 

25.2. Burdick, К. K., and F. A. Graybill. Confidence Intervals on Variance Components. New York: 
Marcel Dekker, Inc., 1992. 

25.3. Satterthwaite, F. E. “An Approximate Distribution of Estimates of Variance Components,” 
Biometrics Bulletin 2 (1946), pp. 110-14. 

25.4. Gaylor, D. W., and F. N. Hopper. “Estimating the Degrees of Freedom for Linear Combinations 
of Mean Squares by Satterthwaite’s Formula,” Technometrics 11 (1969), pp. 691—706. 

25.5. Ting, N., R. K. Burdick, F. A. Graybill, S. Jeyaratnam, and T. F. C. Lu. “Confidence Intervals 
on Linear Combinations of Variance Components That Are Unrestricted in Sign,” Journal of 
Statistical Computation and Simulation 35 (1990), pp. 135-43. 

25.6. Schwarz, C. J. “The Mixed-Model ANOVA: The Truth, the Computer Packages, the Books. 
Part I: Balanced Data.” The American Statistician 47 (1993), pp. 48-59. 

25.7. Hocking, R. R. Methods and Applications of Linear Models: Regression and the Analysis of 
Variance. 2nd ed. New York: John Wiley & Sons, 2003. 

25.8. Greenhouse, S. W., and S. Geisser. “On Methods in the Analysis of Profile Data,” Psycho- 
metrika 24 (1959), pp. 95-112. 

25.9. Huynh, H., and L. Feldt. "Estimation of the Box Correction for Degrees of Freedom from 
Sample Data in the Randomized Block and Split-Plot Designs,” Journal of Educational Statis- 
tics 1 (1976), pp. 69-82. 

25.10. Winer, B. J., D. R. Brown, and К. M. Michels. Statistical Principles in Experimental Design. 
3rd ed. New York: McGraw-Hill, 1991. 

25.11. Burdick, К. K. “Using Confidence Intervals to Test Variance Components.” Journal of Quality 
Technology 26 (1994), pp. 30—38. 

25.12. Searle, S. К. Linear Models for Unbalanced Data. New York: John Wiley & Sons, 1987. 


Problems 


25.1. A student asks why &;; is shown as a separate term in random cell means model (25.1) in view 
of ш; being a random variable in this model. Respond. 

25.2. Refer to Figure 25.1. Here, the situation portrayed is one where the variance c? is larger than 
the variance сл. Is this always the case? Explain. 


1078 Part Five 


Multi-Factor Studies 


25.3. 


*25.6. 


tn each of the following cases. indicate whether ANOVA model Eor model у is more ap 

priate and state your reasons: Pro- 

a. Ina study of absentceism at a plant, the treatments are the three 8-hour shifts, 

b. Ina study of employee productivity, the treatments are 10 production employees selected 
at random from all production employees in a large company. 

c. Ina study of anticipated annual income at retirement, the treatments are the four types of 

retirement plans available to employees. 

d. In ed of tire wear in [8-whecl trucks, the treatments are four tire locations Selected ar 
random. 


. Refertothe Apex Enterprises personnel officers example on page 1036. Explain with reference 


to this example over what the expectation in (25.2a) is taken. Over what is the vari 


ance į 
(25.2b) taken? Over what is the covariance in (25.2c) taken? i 


. Refer to Filling machines Problem 16.11. Suppose that the company uses a large number of 


filling machines and the six machines studied were selected randomly. Assume that ANOVA 
model (25.1) is applicable. 
a. Interpret the following with reference to this example: (1) x.. (2) ол. (3) 0”, (4) оу}. 
b. Test whether or not all machines in the population have the same mean fill; use о = .05, 
State thc alternatives, decision rule, and conclusion. What is the P-value of the test? 
c. Estimate the mean fill for all machines in the population with a 95 percent confidence 
interval. 

Refer to Filling machines Problems 16.11 and 25.5. 

a. Estimate the proportion of the total variability in carton fills that retlects the differences in 
mean fills between machines; use a 95 percent confidence interval. 

b. Estimate c? with a 95 percent confidence interval. Interpret your interval estimate. 

c. Obtain a point estimate of от. 

d. Obtain separate approximate 95 percent confidence intervals for о using the Satterthwaite 
procedure and the MLS procedure. Are these intervals similar? Comment. 


. Sodium content. A researcher studied the sodium content in lager beer by selecting at 


random six brands from the large number of brands of U.S. and Canadian beers sold in a 
metropolitan area. The researcher then chose eight 12-ounce cans or bottles of each selecied 
brand at random from retail outlets in the area and measured the sodium content (in milligrams) 
of each can or bottle. The observations follow. 


i 1 2 3 4 5 6 7 8 


1 24.4 22.6 23.8 22.0 24.5 22.3 25.0 24.5 
2 10.2 12.1 10.3 10.2 9.9 11.2 12.0 9.5 е 


6 21.3 20.2 20.7 20.8 20.1 18.8 21.1 20.3 


Assume that ANOVA model (25.1) is applicable. 


a. Test whether or not the mean sodium content is the same in all brands sold in the metro- 
politan area: usc о = .01. State the alternatives, decision rule, and conclusion. What is the 
P-value of the test? 


b. Estimate the mean sodium contem for all brands; use a 99 percent confidence interval. 


25.8. 


25.9. 


25.10. 


25.11. 
25.12. 


Chapter 25 Random and Mixed Effects Models 1079 


Refer to Sodium content Problem 25.7. 

a. Estimate 02/(02 + o?) with a 99 percent confidence interval. Interpret your interval 
estimate. 

b. Obtain point estimates of o? and o7. 

Estimate o? with a 99 percent confidence interval. 

d. It has been conjectured that the variance of sodium content between brands is more than 
twice as great as that within brands. Conduct an appropriate test using œ = .01. State the 
alternatives, decision rule, and conclusion. 

e. Obtain an approximate 99 percent confidence interval for оў using the MLS procedure. 
Interpret your confidence interval. 


о 


Coil winding machines. A plant contains a large number of coil winding machines. A pro- 
duction analyst studied a certain characteristic of the wound coils produced by these machines 
by selecting four machines at random and then choosing 10 coils at random from the day's 
output of each selected machine. The results follow. 


i 1 2 3 4 5 6 7 8 9 10 


205 204 207 202 208 206 209 205 207 206 
201 204 198 203 209 207 199 206 205 204 
198 204 196 201 199 203 202 198 202 197 
210 209 214 215 211 208 210 209 211 210 


BRWN 


Assume that ANOVA model (25.1) is appropriate. 

a. Test whether or not the mean coil characteristic is the same for all machines in the plant; 
use œ = .10. State the alternatives, decision rule, and conclusion. What is the P-value of 
the test? 

b. Estimate the mean coil characteristic for all coil winding machines in the plant; use a 
90 percent confidence interval. 

Refer to Coil windmg machines Problem 25.9. 

a. Estimate 07/(o7 + с?) with a 90 percent confidence interval. Interpret your interval 
estimate. 

b. Test whether or not o7 and о? are equal; use œ = .10. State the alternatives, decision rule, 
and conclusion. | 

c. Estimate o? with a 90 percent confidence interval. Interpret your interval estimate. 

d. Obtain a point estimate of o7. 

e. Obtain separate approximate 90 percent confidence intervals for o7 using the Satterthwaite 
procedure and the MLS procedure. Are these intervals similar? Comment. 

For mixed effects model (25.42), why is 24 (aB);; = 0 while usually 2 (ap); 4 0? 

A marketing consultant is designing several experiments involving a newly developed low- 

cost food processor. The initial experiment has the objectives (1) to compare the effects on 

unit sales of three possible prices recommended by the sales department ($23.99, $25.49, 
$25.95) and (2) to determine whether the color scheme used for the appliance affects unit 
sales. A great many color schemes are feasible; three (white, green, pink) have been selected 
for the initial experiment to represent the range of possible colors. If the experiment suggests 
that color scheme does have an effect, this aspect of the product design will be investigated in 


1080 Part Five 


Multi-Factor Studies 


*25.16. 


. ln a two-factor ANOVA study with a = 3, b = 2, and n = 5, the two factor eff 


detail in a follow-up study. Which ANOVA model would you employ for 


analyzing the ipi 
Р : i 
experiment? Discuss. та] 


: uh 2 2 2 ects are both 
random with o~ = 5.0, с; = 8.0, оз = 10.0, and oj, = 6.0. Assume that ANOVA mode] 


(25.39) is applicable. 
a. Obtain E{MSA}, E{MSB}, and E(MSAB]. 
b. What would be the expected mean squares if Oa = 0. all other par 


rameters remaining the 
same? 


. A survey statistician has commented: "1 am rather suspicious of uses of random effects and 


mixed eflects ANOVA models. Seldom are the factor levels chosen by a random mechanism 
from a known population." Discuss. 


5. Miles per gallon. An automobile manufacturer wished to study the effects of differences 


between drivers (factor A) and differences between cars (factor B) on easoline consumption, 
Four drivers were selected at random; also five cars of the same model with manual transmis- 
sion were randomly selected from the assembly line. Each driver drove each car twice Overa 
40-mile test course and the miles per gallon were recorded. The data follow. 


Factor B (car) 


Factor A 
(driver) j^1 j22 j= j—4 {=5 
i=1 25.3 28.9 24.8 28.4 27.1 
25.2 30.0 25.1 27.9 26.6 
i=2 33.6 36.7 31.7 35.6 33.7 
32.9 36.5 31.9 35.0 33.9 
;=3 27.7 30.7 26.9 29.7 29.2 
28.5 30.4 26.3 30.2 28.9 
р=4 29.2 32.4 27.7 31.8 30.3 


29.3 32.4 28.9 30.7 29.9 


Assume that random ANOVA model (25.39) is applicable. 


a. Test whether or not the two factors interact; use œ = .05. State the alternatives, decision 
rule, and conclusion. What is the P-value of the test? 

b. Test separately whether or not factor А and factor B main effects are present. For each test, 
use о = .05 and state the alternatives, decision rule. and conclusion. What is the P-value 
for each test? 

c. Obtain point estimates of ор and оз. Which factor appears to have the greater effect on 
gasoline consumption? 

d. Use the MLS procedure to obtain an approximate 95 percent confidence interval for On. 
Interpret your interval estimate. d 

e. Use the Satterthwaite procedure to obtain an approximate 95 percent confidence interval 
for оў. Is your interval estimate reasonably precise? Comment. 


Refer to Disk drive service Problem 19.16. Suppose that the service center employs a large 
number of technicians and that the three included in the study were selected at random. Assume 
that the conditions of mixed ANOVA model (25.42) are applicable, except that here factor А 
effects are random and factor B effects are fixed. Under current conditions. all technicians 
service each of the three makes with approximately equal frequency. 


£. 


Chapter 25 Random and Mixed Effects Models 1081 


„ Test whether or not the two factors interact; use о = .01. State the alternatives, decision 


rule, and conclusion. What is the P-value of the test? 


. Obtain a point estimate of c7,. Does с2, appear to be large relative to o?? Explain. 
. Test whether or not factor A main effects are present; use œ = .01. State the alternatives, 


decision rule, and conclusion. Why is it meaningful here to test for factor A main effects? 


. Test whether or not factor B main effects are present; use œ = .01. State the alternatives, 


decision rule, and conclusion. Why is it meaningful here to test for factor B main effects? 


. Itis desired to obtain all pairwise comparisons between the means for the three disk drive 


makes. Use the Tukey procedure and a 95 percent family confidence coefficient to make 
these comparisons. State your findings. 


. Use the Satterthwaite procedure to obtain an approximate 99 percent confidence interval 


for j.,. Interpret your interval estimate. 


Obtain an approximate 99 percent confidence interval for o? using the MLS procedure. 
Does the variability between technicians appear to be large? Explain. 


Imitation pearls. Preliminary research on the production of imitation pearls entailed study- 
ing the effect of the number of coats of a special lacquer (factor A) applied to an opalescent 
plastic bead used as the base of the pearl on the market value of the pearl. Four batches of 
12 beads (factor B) were used in the study, and it is desired to also consider their effect on the 
market value. The three levels of factor A (6, 8, and 10 coats) were fixed in advance, while 
the four batches can be regarded as a random sample of batches from the bead production 
process. The market value of each pearl was determined by a panel of experts. The market 
value data (coded) follow. 


РАМУ Factor В (batch) 
(number of coats) j=1 j=2 j=3 j=4 
i=l 6 72.0 72.1 75.2 70.4 
72.8 73.3 77.8 72.4 
i=2 8 76.9 80.3 80.2 74.3 
74.2 77.2 79.9 72.9 
i=3 10 76.3 80.9 79.2 71.6 


75.0 80.2 81.2 74.4 


Assume that mixed ANOVA model (25.42) is applicable. 


a. 


Test for interaction effects; use œ = .05. State the alternatives, decision rule, and conclu- 
sion. What is the P-value of the test? 

Test for factor A and factor B main effects. For each test, use œ = .05 and state the 
alternatives, decision rule, and conclusion. What 15 the P-value for each test? 


. Estimate D, = рә. — ш. and Dz = из. — ро. by means of the Bonferroni procedure with 


a 90 percent family confidence coefficient. State your findings. 

Use the Satterthwaite procedure to obtain an approximate 95 percent confidence interval 
for u2.. Interpret your confidence interval. 

Use the MLS procedure to obtain an approximate 90 percent confidence interval for оў. 
Does о? appear to be large compared to c?? 


nev WON RE RUBIA 777 


1082 Part Five 


Multi- Factor Studies 


25.18. 


*25.19 


*25.20 


25.21. 


Refer to Coin-operated terminals Problem 20.2. Suppose that the weeks (factor В 
selected intentionally but the locations (factor A) had been selected at random from a lar 

number of possible locations. Assume that the conditions for additive random block effects 
ANOVA model (25.67) are appropriate, except that here factor A effects (blocks) are m 
and factor B effects arc fixed. n 


) had been 


a. Test for factor B main effects; use a = .05. State the alternatives, decision rule, and c 
clusion. What is the P-value of the test? iod 


b. Why can you not test for factor A main effects here? 


Road paint wear. A state highway department studied the wear characteristics of five dif- 
ferent paints at eight locations in the state. The standard, currently used paint (paint 1) and 
four experimental paints (paints 2, 3, 4, 5) were included in the study. The eight locations were 
randomly selected. thus reflecting variations in traffic densities throughout the state, At each 
location, a random ordering of the paints to the chosen road surface was employed, After a 
suitable period of exposure to weather and traffic, a combined measure of wear, Considering 
both durability and visibility, was obtained. The data on wear follow (the higher the score, the 
better the wearing characteristics). 


Location Paint (/) Location a ЕНЕ ERN 
i 1 2 3 4 5 i 1 2 3 4 5 
1 11 13 10 18 15 5 14 16 13 22 16 
2 20 28 15 30 18 6 25 27 26 33 25 
3 8 10 8 16 12 7 43 46 41 55 42 
4 30 35 27 41 28 8 13 14 12 20 13 


a. Obtain the residuals for additive randomized block model (25.67) and plot them against 
the fitted values. Also prepare a normal probability plot of the residuals. Summarize your 
findings about the appropriateness of model (25.67). 

b. Plot the responses by location in the format of Figure 21.2 on page 896. What does this 
plot suggest about the appropriateness of the no-interaction assumption here? 

c. Conduct the Tukey test for additivity of location and treatment effects. conditional on the 
locations selected; use а = .005. State the alternatives, decision rule, and conclusion. 


Refer to Road paint wear Problem 25.19. Assume that additive randomized block model 

(25.67) is appropriate. 

a. Obtain the analysis of variance table. 

b. Test whether or not the mean wear differs for the five paints; use o — .05. State the 
alternatives. decision rule, and conclusion. What is the P-value of the test? 

c. Compare the mean wear of each experimental paint against that of the standard paint; 
use the most efficient multiple comparison procedure with a 90 percent family confidence 
cocfficient. Summarize your findings. 

d. Paints I, 3. and 5 are white, whereas paints 2 and 4 are yellow. Estimate the difference in 
the mean wear for the two groups of paints with a 95 percent confidence interval. Interpret 
your findings. 

Muscle tissue. A physiologist studied the effects of three reagents on muscle tissue in dogs. 

Ten litters of three dogs each were randomly selected and the three reagents were randomly 

assigned to the three dogs in each litter. The data on the effects of the reagents follow (the 


m 


ua ж, а 


a ks 


Chapter 25 Random and Mixed Effects Models 1083 


higher the value, the higher the activity level): 


Litter . Reagent(j — Litter . Reagent(j) — 
i 1 2 3 i 1 2 3 
1 10 15 14 6 7 9 10 
2 8 12 13 7 24 30 27 
3 21 27 25 8 16 18 20 
4 14 17 17 9 23 29 32 
5 12 18 16 10 18 22 21 


а. Obtain the residuals for additive randomized block model (25.67) and plot them against 
the fitted values. Also prepare a normal probability plot of the residuals. Summarize your 
findings. 

b. Plot the responses by litter in the format of Figure 21.2 on page 896. What does this plot 
suggest about the appropriateness of the no-interaction assumption here? 

c. Conduct the Tukey test for additivity of litter and reagent effects, conditional on the litters 
selected; use œ = .025. State the alternatives, decision rule, and conclusion. 

d. Based on parts (b) and (c), would interaction randomized block model (25.74) be more 
appropriate here? What practical differences exist in using models (25.67) and (25.74)? 

25.22. Refer to Muscle tissue Problem 25.21. Assume that additive randomized block model (25.67) 
is applicable. 

a. Obtain the analysis of variance table. 

b. "Test whether or not the mean activity level differs for the three reagents; use significance 
level œ = .025. State the alternatives, decision rule, and conclusion. What is the P-value 
of the test? 

c. Reagents 2 and 3 were expected to be similar to each other but to differ from reagent 1. 
Use the most efficient multiple comparison procedure with a 95 percent family confidence 
coefficient to estimate: 

Ly = рә — из 
H-2 + H-3 
‚а 
Summarize your findings. 
*25.23. Refer to Table 25.11 on page 1069. All three factors in this study have random effects. 

a. Test whether or not oZ,, equals zero; use œ = .025. State the alternatives, decision rule, 
and conclusion. What is the P-value of the test? 

b. Test whether or not AB interactions are present. Use significance level о = .01. State the 
alternatives, decision rule, and conclusion. 

c. Test whether machines (factor B) have main effects. Use significance level a = .01. State 
the alternatives, decision rule, and conclusion. 

d. Use the Satterthwaite procedure to obtain an approximate 95 percent confidence interval 
for c2. Interpret your interval estimate. 

25.24. Referto Electronics assembly Problem 24.12. Suppose that the number of feasible sequences 
in which the components can be attached to the board is very large and that the three sequences 
studied were selected randomly from the set of operationally feasible sequences. Assume that 
a normal error ANOVA model is applicable where factors A and C have fixed effects and 


1084 Part Five 


Multi-Factor Studies 


*25.26. 


UA 
t5 


factor B has random effects. Some relevant expected mean squares for this тойс} are: 


Э, 


і 2 


E{MSA} = o? + ben + спод, E{MSABC} = o? + ng? 
а— apy 
E{MSB} = о? + аспор E{MSE} = о? 
t) ч э L Para э 
E{MSAC} = o^ + Ьн—=—————-+ по], 


ани 


а. Whatis the appropriate test statistic for testing for AC interactions? For testing for factor B 
main effects? 

b. Test whether or not AC interactions are present; use œ = .05. State the alternatives, 
decision rule, and conclusion. 

c. Test whether or not factor B main effects are present; use о = .05. State the alternatives 
decision rule. and conclusion. i 

d. Estimate оў using the MLS procedure with a 95 percent confidence coefficient. Interpret 
your interval estimate. 


5. Consider mixed ANOVA model (25.79) where factor A has fixed effects and the other two 


factors have random effects. Find the Satterthwaite test statistic F** for testing for factor A 
main effects. What is the approximate number of degrees of freedom associated with the 
denominator of this test statistic? 

Refer to Disk drive service Problems 19.16 and 25.16. Suppose that observations ү; = 57, 
Yoo, = 61. and Уэ; = 66 are missing because the time recording instrument malfunctioned. 
Assume that the conditions of mixed ANOVA model (25.42) are applicable (except that here 
factor A effects are random, factor B effects are fixed. and unequal sample sizes exist) and that 
the observations Y;;, are jointly normally distributed. Use the maximum likelihood approach 
to answer the following. 


a. Obtain maximum likelihood estimates of all unknown parameters. Are any of the estimated 
variances of the random effects equal to zero? If so, what would this imply about the 
applicability of the likelihood ratio statistic (14.60)? 

b. Revise the model by dropping the main factor A effect and obtain maximum likelihood 
estimates of the unknown parameters in the revised model. Do these estimates differ from 
the ones obtained in part (a)? 

c. Use the z* test statistic to test whether or not the two factors interact; use a = .01. State 
the alternatives, decision rule, and conclusion. What is the P-value of the test? 

d. Use the likelihood ratio test statistic (14.60) to test whether or not factor B main effects 
are present: control the risk of Type 1 error at œ = .01. State the alternatives, decision rule, 
and conclusion. 

е. Obtain an approximate 99 percent confidence interval for огу. Interpret your confidence 
interval. 


. Refer to Imitation pearls Problem 25.17. Suppose that observations Үз = 67.4 and Y32 = 


73.1 are missing because of flaws in the beads. Assume that the conditions of mixed ANOVA 
model (25.42) are applicable (except that unequal sample sizes are present here) and that the 
observations У; are jointly normally distributed. Use the maximum likelihood approach to 
answer the following. 


a. Obtain maximum likelihood estimates of all unknown parameters. Are any ofthe estimated 


variances of the random effects equal to zero? If so, what would this imply about the 
applicability of the likelihood ratio statistic ( 14.60)? 


Exercises 


Chapter 25 Random and Mixed Effects Models 1085 


b. Revise the model by dropping the interaction term and obtain maximum likelihood esti- 
mates of the unknown parameters in the revised model. Do these estimates differ from the 
ones obtained in part (a)? 

c. Use the likelihood ratio test statistic (14.60) to test for factor B main effects; use a = .05. 
State the alternatives, decision rule, and conclusion. What is the P-value of the test? 

d. Use the likelihood ratio test statistic (14.60) to test whether factor A main effects are 
present; control the risk of Type I error at a = .05. State the alternatives, decision rule, 
and conclusion. What is the P-value of the test? 

e. Obtain an approximate 95 percent confidence interval for оў. Interpret your interval 
estimate. 


25.28. 
25.29. 


25.30. 
25.31. 
25.32. 


25.33. 


25.34. 
25.35. 


Show that и’ defined in (25.102) equals n when n; = n. 

What are the values r and n that minimize c?( Y] in (25.12) for a given total sample 
size пт? 

Derive the confidence limits in (25.19) from those in (25.18). 

For random ANOVA model (25.39), derive o?(Y;..). 

Consider randomized block model (21.1), but with random treatment effects. Derive o?(Y;;] 
and c?(Y;). 

Refer to Dental pain Problem 21.9. Suppose that the subjects in the study had been randomly 
selected from eight towns (blocks), and that the towns were randomly selected from a popu- 
lation of towns. Assume that additive randomized block model (25.67) is applicable, except 
that the factorial structure of the fixed treatment effects needs to be recognized. 

à. State the randomized block model for this case. 


b. What is the appropriate test statistic for testing whether or not the two factors interact? 
What are the appropriate test statistics for testing for main effects? [Hint: Consider the test 
for treatment effects in model (25.67).] 


Derive (25.68c). 
For random ANOVA model (25.77), find the variance of the estimated mean Y... Я 


Projects 


25.36. 


Consider a two-factor study with a — 3, b — 2, and n — 5. Random ANOVA model (25.39) 
is applicable with u.. = 92, o2 = 24, oj 11, 075 Л, and o? = 8. 


a. Using a normal random number generator, obtain a value for each of the main effects 
a; (i = 1, 2, 3) and В; (j = 1, 2) and for each interaction effect (o/f);;. 

b. Generate five error terms for each treatment. 

c. Combine the parameter values obtained in part (a), the error terms obtained in part (b), and 
H-- = 92 to yield five observations Y;;, for each treatment. 

d. For the observations obtained in part (c), calculate the F* test statistic for testing whether 
or not factor A main effects are present. What is your conclusion using œ = .05? 

e. Repeat the steps in parts (a)—-(d) 100 times. Calculate the mean of the 100 numerator mean 
squares and the mean of the 100 denominator mean squares. Are these means close to 
theoretical expectations? 

f. In what proportion of the 100 trials did the test lead to the conclusion of the presence of 
factor A main effects? Does the test have good power for the case considered here? 


1086 Part Five 


Multi-Factor Studies 


25.37. 


25.38. 


25.39. 


Refer to Road paint wear Problem 25.19. 


a. Estimate the variance-covariance matrix of the treatment observations їп а block: use 
on page 1135 to obtain the entries in the matrix. 


b. Does the compound symmetry property of (25.71) appear to be reasonable here? Explain 
c. Does the sphericity property of (25.73) appear to be reasonable here? Explain. | 
Refer to Muscle tissue Problem 25.21. 


a. Estimate the variance-covariance matrix of the treatment observations in a block; use (27 8) 
on page 1135 to obtain the entries in the matrix. 


(27.8) 


b. Does the compound symmetry property of (25.71) appear to be reasonable here? Explain. 
c. Does the sphericity property of (25.73) appear to be reasonable here? Explain. 


Refer to Miles per gallon Problem 25.15. Suppose that observation Y232 = 31.9 is missing 

because the record was lost for this experimental trial. Assume that random ANOVA model 

(25.39) is applicable (except that the sample sizes are unequal here) and that the oberservationg 

Y; are jointly normally distributed. 

a. Use the method of maximum likelihood to estimate x.. and the variance components o2, 
65, оду, апа c?. Which variance component appears to be largest? Also obtain the estimated 
standard deviation for each of the estimated variance components. 

b. Obtain a bootstrap sample by using a normal random number generator to provide normal 
values with means zero and variances equal to the estimates of the variance components in 
part (a) for (1) thea; (і = 1,...,4), (2) the Bj (J = 1,..., 5), (3) the (е8). and (4) the 
п error terms єг for each treatment. Combine these with ji.. obtained in part (a) to create 
the лу bootstrap outcomes Y;;, for each treatment. 

c. Use the method of maximum likelihood to estimate c7, оў, and o; for the bootstrap sample 
obtained in part (b). 

d. Repeat parts (b) and (c) 250 times. 

e. Obtain histograms of the bootstrap distributions for the 250 bootstrap estimates of c7, en 
and с2,. Also obtain the mean and standard deviation for each of the bootstrap distributions. 
Based on these results and the results in part (a), does it appear that large-sample inference 
procedures are appropriate here? Explain. 


Part 


pecialized 
tady Designs 


Chapter 


Nested Designs, 
Subsampling, and 
Partially Nested Designs 


In this chapter, we take up the basic elements of nested designs, including the use of 
subsampling. We begin by considering the general concept of nested designs and describe 
how these designs differ from crossed designs. We then take up in detail two-factor nested 
designs and their analysis. We conclude by considering subsampling designs and partially 
nested designs. 


26.1 Distinction between Nested and Crossed Factors 


Example 1 


1088 


In the factorial studies considered so far, where every level of one factor appears with each 
level of every other factor, the factors are said to be crossed. A different situation occurs 
when factors are nested. The distinction between nested and crossed factors will now be 
illustrated by some examples involving two-factor studies. 


A large manufacturing company operates three regional training schools for mechanics, one 
in each of its operating districts. The schools have two instructors each, who teach classes 
of about 15 mechanics in three-week sessions. The company was concerned about the effect 
of school (factor A) and instructor (factor B) on the learning achieved. To investigate these 
effects, classes in each district were formed in the usual way and then randomly assigned 
to one of the two instructors in the school. This was done for two sessions, and at the end 
of each session a suitable summary measure of learning for the class was obtained. The 
results are presented in Table 26.1. Ы 

The layout of Table 26.1 appears identical to an ordinary two-factor investigation, with 
two observations per cell (see, e.g., Table 19.7). In fact, however, the study is not an 
ordinary two-factor study. The reason is that the instructors in the Atlanta school did not 
also teach in the other two schools, and similarly for the other instructors. Thus, six different 
instructors were involved. An ordinary two-factor investigation with six different instructors 
would have consisted of 18 treatments, as shown in Figure 26.1а. In the training school 
example, however. only six treatments were included, as shown in Figure 26.16, where 


ABLE 26.1 

пре Data 

or Nested 

^ Factor 
dndy— 
"ns 


School 
Example (class 
Jearning scores, 


H 


5 


a6 7, мз 


SEIGURE 26.1 
"Illustration 
Of Crossed 
ES Nested 
3 Factors— 
Training 
~ School 
3 { Example. 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1089 


Factor B instructor) 


Factor A (school). ИНИ зс. ыз 
i 1 2 Average 
Atlanta 25 14 
29 11 
Average Yu-27 2. = 12.5 Y. 2 19:75 
Chicago 
. Average Ya. = 8.5 Ya; = 20 Y2. = 14.25. 


San Francisco 


Average Үз. = 18:5 Yos = 3:5 Ys.. = 11.00 
Average Y.z15 


(a) Crossed Factors 


— 
споо! (factor 

ESESERESIKHEE 
[m | | | | | | | 


San Francisco 


(b) Nested Factors 


DA (factor B) 
School (factor A) 


the crossed-out cells represent treatments not studied. Figure 26.2 contains an alternative 
graphic representation of the nested design for the training school example, including the 
two replications of the study. 

It is clear from Figure 26. 1b that the experimental design for the training school example 
involves an incomplete factorial arrangement of a special type, where each level of factor B 
(instructor) occurs with only one level of factor A (school). Specifically here, each instructor 


1090 Part Six Specialized Sindy Designs 


FIGURE 26.2 Graphic Representation of Two-Factor Nested Design— Training School Example, 


School (7) 


Instructor (j) 


Class (К) 


Example 2 


1 


1 2 3 
@=1) @=2) (з) 


1)(К=2) (К=1)(К=2)  (-Dj(k-2 (kz1)(k-2 


teaches in only one school. Factor B is therefore said to be nested within factor A. As noted 
earlier, in an ordinary factorial study where every factor level of A appears with every factor 
level of B, factors A and B are said to be crossed. 

There is another way to look at the distinction between nested and crossed designs. Let 
Lij denote the mean response when factor A is at the th level and factor B is at the jth level. 
If the factors are crossed, the jth level of B is the same for all levels of A. If, on the other 
hand, factor B is nested within factor A. the jth level of B when A is at level | has nothing 
in common with the jth level of B when A is at level 2. and so on. For instance, in acrossed 
factorial study of the effects of price ($1.99, $2.49) and advertising level (high, low), a 
particular advertising level is the same no matter with which price it appears, and similarly 
for the price levels. On the other hand, in the nested design for the training school example, 
the first instructor in school | is not the same as the first instructor in school 2, and so on. 


Ananalyst was interested in the effects of community (factor A) and neighborhood (factor B) 
on the spread of information about new products. Information was obtained from samples 
of families in various neighborhoods within selected communities. Since the neighborhood 
designated | in a given community is not the same as the neighborhoods designated | in 
the other communities, and similarly for the other neighborhoods. neighborhoods here are 
nested within communities. 


Comments 


і. The distinction between crossed and nested factors is often a fine one. In Example 2, if the 
neighborhoods of each community represented specilied average income levels so that, say, the first 
neighborhoods in each community had an average income of $5.000—$9.999, the second neighbor- 
hoods an average income of $10,000—$ 19.999, and so on for the other neighborhoods, one could view 
the design as a crossed one. The factors would be community and economic level of neighborhood, 
and these would be crossed since a given economic level is the same for all communities, and vice 
versa. : 

2. Nested factors are frequently encountered in observational studies where the researcher cannot 
manipulate the factors under study. or in experiments where only some factors can be manipulated. 
Factors that cannot be manipulated, it will be recalled. are designated observational factors, in dis- 
tinction to experimental factors that can be assigned at will to the experimental units. Example 2 is ап 
observational study where both community and neighborhood are observational factors since families 
(the study units) were not randomly assigned to either community or neighborhood. In Example 1, 
school is an observational factor because the classes of a school (the experimental units) are made 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1091 


up of mechanics from the district in which the school 15 located. Instructors in this example are an 
experimental factor since they are assigned randomly to a class, but a nested design results because 
the randomization of instructors is restricted to within a school. m 


s 


B 
x 


62  Two-Factor Nested Designs 

ХИ We now consider nested designs involving two factors, опе of which is nested inside the 
other. For consistency, we always consider the case where factor B is nested within factor A. 
We initially assume that both factor effects are fixed, but later we also consider the case of 
random effects. We assume throughout that all treatment means are of equal importance. 


4 
Kl 


Development of Model Elements 

Ё We shall use the customary notation for a two-factor study, and let и; denote the mean 
response when factor A is at the ith level (i = 1,...,а) and factor В is at the jth level 
(j = 1,..., b). As usual, when all mean responses are of equal importance we define: 


= iM (26.1) 
b 
For the training school example of Table 26.1, ил. represents the mean learning score for 
the Atlanta school, averaged over the instructors of that school, and u2. and из. are inter- 
preted similarly. Note once more that the и. here represent mean learning scores that have 
been averaged over different instructors. 
We define the main effect of the ith level of factor A as usual: 


hi. 


о; = Mi. — M- (26.2) 
where: 
"m >; 25 Hij = УЗ Hi. (26.2a) 
ab a 


is the overall mean response. It follows from (26.2a) that: 


Sra =0 (26.3) 


In a nested design, it is not meaningful to employ a model component for the main effect 
of the jth level of factor B. To see why, consider again the training school example. Since 
each school employs different instructors and the jth instructors in the various schools are 
not the same, it would be meaningless to consider the effect of the jth instructor, averaged 
over all schools. Instead, the individual effects of each instructor in each school need to be 
considered. We denote these individual effects by Sju), where the subscript j (i) indicates 
that the jth factor level of B is nested within the ith factor level of A. Ву is defined as 
follows: 


Pig = шу — Hi- (26.4) 
which can be rewritten, utilizing (26.2): 


Bio = Hij — 0i — Be. (26.4a) 


1092 Part Six Specialized Study Designs 


It follows from (26.4) and (26.1) that: 
У Bin = 0 i —d.....a 
i 


The meaning of fj, can be seen most clearly from (26.4). With reference to the train 
school example, 6 д is simply the difference in the mean learning score for the jth быу 
of school i and the average of the mean learning scores for all instructors in that aa 
Thus. the effect of the jth instructor in the ith school is measured with respect to the Overall 
mean learning score for the school in which the instructor teaches. We shall call Bio th 
specific effect of the jth level of factor B nested within the ith level of factor А. ie 

We have now expressed the mean response pj in terms of the overall mean, the main 
effect of the ith level of factor A, and the specific effect of the jth level of factor В nested 
within the ith level of factor А, as can be seen from (26.4a): 


(26.5) 


Hij = ш.. +0 + Виз = Me. (QU — Mee) + S, — Шш.) (26.6) 


For the training school example. the mean learning score for the jth instructor in school į 
has been expressed in terms of the overall mean, the main effect of school i, and the Specific 
effect of instructor j within school i. 

To complete the model, we need only add a random error term єк. 


Nested Design Model 


Let У; з denote the response for the kth trial when factor A is at the ith level and factor B is at 
the jth level. We assume that there are n replications for each factor level combination, i.e., 
k — l.....n.and that / = 1,...,a@ and j = l,.... b. Such a study is said to be balanced 
because the same number of factor B levels is nested within each factor A level and the 
number of replications is the same throughout. 


When both factors A and B have fixed effects, an appropriate nested design model is: 


Үк = ш.. + о, + Pjur + Eijk (26.7) 


where: 


H.. is a constant 

о; are constants subject to the restriction У `0; = 0 

Ваз are constants subject to the restrictions >, Bia = О for alli 
єг are independent N (0. с?) 
i-l..aj-k....bk-lk...n 


The expected value and variance of observation Y;;, for nested design model (26.7) with 
fixed factor effects are: 


ЕУ} = H. +0; + Pio (26.82) 
c^ (Ys) = о? (26.8b) 


Thus, all observations have a constant variance. Further, the observations Y;;, аге indepen- 
dent and normally distributed for this model. 


Random Fac 


" 
Н 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1093 


Comments 


1. It is not necessary, as in model (26.7), that the study be balanced, that is, that the number of 
replications be equal for all factor combinations and that the number of levels of nested factor B 
(number of instructors in the training school example) be the same for each level of factor A (school 
in this example). We shall discuss the removal of some of these restrictions in Section 26.6. We only 
point out now that the computations become more complex when the study is unbalanced. 

2. There is no interaction term in nested design model (26.7). There is no need for it since factor B 
is nested within factor A, not crossed with it. To put this somewhat differently, with reference to the 
training school example, it is not possible to estimate a school-instructor interaction when each 
instructor teaches іп only one school. The teacher effect jg), since it is specific to a given school i, 
in a sense incorporates the interaction effect between the particular teacher j (in the ith school) and 
the ith school, but it is not possible in a nested design to disentangle this interaction effect. 

3. The factor level means ju. in a nested design are not generally the same as the corresponding 
means in a crossed design. Remember that in a nested design, the ш. are obtained by averaging over 
only some of the distinctive levels of factor B. With reference to the training school example, the и. 
are obtained by averaging over only those teachers who instruct in the ith school. In a crossed design, 
on the other hand, the џ;. would be obtained by averaging over all instructors included in the study. 

n 


tor Effects 


If both factors A and B have random factor levels, nested design model (26.7) 1s modified 
with o;, В; and єр being independent normal random variables with expectations 0 
and variances o2, оў, and о?, respectively. Thus, it is assumed that all 8; have the same 
variance оў. The assumption that all f; have the same variance also is made if only 
factor B is random. It is important to check whether this assumption is appropriate, since it 
may well be that the mean responses ji, Hi2, - -- , in one factor A level (plant, school, city, 
etc.) differ in variability from those in other factor A levels (other plants, schools, cities, 
etc.). Tests for equality of variances are discussed in Section 18.2. 


26.3 Analysis of Variance for Two-Factor Nested Designs 


‘Fitting of Model 


The least squares and maximum likelihood estimators of the parameters in nested design 
model (26.7) are obtained in the usual fashion. Employing our customary notation for 
sample data in factorial studies, the estimators are: 


Parameter Estimator 
ш. Ё. = Y.. (26.92) 
а; б; = Y... = Y.. (26.9b) 
Вук Ву = Yije — Yi-- (26.9c) 


The fitted values therefore are: 
Y, = Y.. + (Y. Y.) + (X. — Y.) = X (26.10) 
and the residuals are: 
епк = Yi — Yi = Yije — Юр (26.11) 


1094 PartSix Specialized Study Designs 


Sums of Squares 


The analysis of variance for nested design model (26.7) is obtained by decomposj 
total deviation Ү; д — Y.. as follows: ng the: 


—— —— —— . 2) 
Total deviation A main cflcct Specific B Residual 
effect when А 
al Аһ level 


When we square (26.12) and sum over all cases, all cross-product terms drop out and № 
4 е 
obtain: 


SSTO — SSA 4- SSB(A) 4- SSE (26.13) 


where: 


SSTO = 3 у X Oi- Y» (26.132) 
i i k 


SSA = bn 302 — У...)? (26.13b) 
SSB(A) = nV M X. – Y.» (26.130) 
i j 


55Е = УУУ ин 9) =O, (26.134) 
i j k i j k 


SSTO is the usual total sum of squares, and SSA is the ordinary factor A sum of squares, 
reflecting the variability of the estimated factor level means Y,... 

SSB(A) is the factor B sum of squares, with the notation reflecting that factor B is nested 
within factor A. SSB(A) is made up of terms such as: 


n3 X – Y.» (26.14) 
j 


The term in (26.14) is simply the ordinary factor В sum of squares when factor A 1s at 
level i. These terms are then summed over all levels of factor A. 

Finally, the error sum of squares SSE is, as usual, the sum of the squared residuals 
and reflects the variability of each observation Y;;, around the corresponding estimated 
treatment mean Y;;.. Alternatively, we can view SSE as being made up of terms such as: 


3930752575; (26.15) 
j k 

The term in (26.15) is simply the ordinary error sum of squares within the ith level of 
factor A. These terms are then summed over all levels of factor A. 

Thus, a nested two-factor design can be viewed as a series of single-factor investigations 
at the successive levels of the other factor. In terms of the training school example, a study 
of the effects of instructors (В) within any given school (А;) leads to the usual sums of 
squares for instructors and errors in a single-factor analysis of variance within school Ai, 


= Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1095 


- ABLE 26.2 Relation between Nested Two-Factor ANOVA and Single-Factor ANOVAs—Training School 
sample 


ques 
Я Single-Factor.ANOVAs 
: School 1 School 2 School 3 Nested Two-Factor 
choo! | 29100 АМОУА 
55 ағ 55 ағ m df 


“SSB(A2) 2-1 SSB(A3) 2-1 


SSB(A1) 2-1 3(2—1) 


SSB(A) 


| SSE(A2) 2(2—1) ‘| SSECAS): 22-1) 


SA) 22-1) | 


SSE 3(2)(2 — 1) 


SSTO(A)) 222-1 SSTO(A2) 2(2)—1 SSTO(A3) 222-1 


SSA 3-1 


STO 3(202-1 


denoted by SSB(A;) and SSE(A;): 
SSB(A;) =n У (Wy —Y-. SSE) = УУ Oi — Yi 
j k 


J J 
These are then aggregated to yield SSB(A) and SSE, respectively. It is only the between- 
schools sum of squares SSA that introduces explicitly the other factor. Table 26.2 demon- 
strates this relation between the single-factor analyses of variance for each school and the 
two-factor analysis of variance for the nested design. 


Degrees of Freedom 


The degrees of freedom associated with the various sums of squares can be deduced directly 
from the known relationships already studied. Since there is a total of abn cases, the degrees 
of freedom associated with SSTO are abn — 1. For any level of factor A, there are b(n — 1) 
degrees of freedom associated with the error sum of squares. Aggregating over all levels of 
factor A, there are ab(n — 1) degrees of freedom associated with SSE. Similarly, for any 
level of factor A, there are b — 1 degrees of freedom associated with the factor B sum of 
squares. Hence, by aggregating over all levels of factor A, we find that there are a(b — 1) 
degrees of freedom associated with SSB(A). Finally, since there are a levels of factor A, 
there are a — 1 degrees of freedom associated with SSA. 

Table 26.2 shows this aggregation of the degrees of freedom for the training school 
example, and Table 26.3 presents the general analysis of variance table for two-factor 
nested design model (26.7) where factor B 1s nested within factor A. 


Example _ In the training school example of Table 26.1, both schools and instructors were regarded as 

7——— — —— fixed factors; hence, model (26.7) was deemed appropriate. Figure 26.3 presents aligned dot 
plots of the class learning scores Y;;, for each school. Note that different symbols are used 
for the two instructors within each school. Figure 26.3 suggests strongly that differences 
between instructors within a school are present and that there may be differences 1n the 
mean learning for the three schools. Note also from the dot plots that the variability of the 
class learning scores for the two classes taught by each of the six 1nstructors appears to be 
reasonably constant, as required by model (26.7). 


1096 Part Six Specialized Study Designs 


TABLE 26.3 ANOVA Table for Nested Balanced Two-Factor Fixed Effects Model (26.7) (B nested within 4) 


Source of Variation SS df MS 


E(MS) 78 
Factor A SSA = bn (Y. — Y-.? de MSA 
Factor B (within A) SSB(A) = nY Y (У,у. — Yi-Y, a(b — 1) MSB(A) 
Error SSE = Ууу (ig — Y? ab(n— 1) MSE 
Total 55то = УУУ (Yi — Y-Y abn —1 
FIGURE 26.3 
Dot Plots of 
Class Learning San Francisco (i = 3) 
Scores— 
Training 5 — 
School £ Chicago (i = 2) 2 нола z 5 
Example. А UT 


Atlanta (¢ = 1) 


0 10 20 30 
Learning Score 


To analyze the instructor and school effects formally, we begin by obtaining the analysis 
of variance. The sums of squares were obtained as follows using formulas (26.13): 


SSTO = (25 — 15)? + (29 — 15)? +--- + (2 — 15)? = 766 
SSA = 2(2)[(19.75 — 15)? + (14.25 — 15)? + (11.00 — 15)?] = 156.5 
SSB(A) = 2[Q7 — 19.75)? + (12.5 — 19.75)? + --- + (3.5 — 11.00)2] = 567.5 
SSE = (25 — 27)? + (29 — 27)? + --- + (2—3.5)? = 42 
Table 26.4a contains the analysis of variance. 


Comment 


Most analysis of variance computer packages provide an option for obtaining the ANOVAwfor nested 
designs. Should this option be unavailable, rhe ordinary ANOVA for crossed factors can be used 
with only slight inconvenience when the nested study is balanced. SSTO, SSA, and SSE with the 
crossed-factor analysis will be the same, and SSB(A) is obtained from the relation: 


SSB(A) = SSB + SSAB (26.16) 
— =—_ 
Nested Crossed 


The same relation holds for the associated degrees of freedom. и 


(а) ANOVA Table 
Source of Variation $$ df MS 
Schools (А) 7 SSA = 156.5 2 78.25 
Instructors, within schools [B(A)] SSB(A) — 567.5 3 189.17 
, Error (E) SSE = (6 7.00 
Total SSTO — 766.0 11 
(b) Decomposition of SSB(A) 
| Source of Variation SSB(A)) df MSB( Aj) 
Ё Instructors, Atlanta 210.25 1 210.25 
{ Instructors, Chicago 132.25 1 132.25 
Instructors, San Francisco 225.00 1 225.00 
Total 567.5 3 


: Tests for Factor Effects 
> Tests for factor effects in a nested two-factor study are straightforward. The appropriate test 
statistics are determined, as for a crossed two-factor study, by comparing the expected values 
of the ANOVA mean squares. The expected mean squares for nested fixed effects model 
(26.7) are shown in Table 26.3. They can be obtained by somewhat tedious derivations. We 
do not illustrate these derivations because Appendix D describes a relatively simple method 
of finding expected mean squares for any balanced nested design. Also, many computer 


Ae nO uL. 


packages provide the expected mean squares for nested models. 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1097 


The E{MS} column in Table 26.3 indicates that for fixed effects model (26.7), the test 


for factor A main effects: 
Ho: all o; = 0 
Не: not all o; equal zero 
1s based on the test statistic: 
MSA 
~ MSE 
and the decision rule to control the level of significance at о is: 
If F* < ЕП – оа — 1, (n — 1)ab], conclude Ho 
If F* > ЕП — оза — 1, (n — 1)ab], conclude Н, 


ж 


Similarly, to test for factor B specific effects: 
Ho: all Pio =0 
Ha: not all Ву equal zero 
the appropriate test statistic is: 


ж 


. MSB(A) 
7 MSE 


(26.172) 


(26.17b) 


(26.17€) 


(26.182) 


(26.18b) 


1098 Part Six Specialized Study Designs 


Example 


and the appropriate decision rule is: 


If F* < F[1—ao;a(b — 1), (n — рар], conclude Ho 
If F* > ЕП —o:a(b — 1), (1 — Nab], conclude H, (26.18с) 


For the analysis of variance in Table 26.4a for the training school example, we conduct the 
first test to determine whether or not main school effects exist. The alternatives are given 
in (26.17a), and test statistic (26.17b) here is: 


* 


_ 7825 


—— = 11. 
7.00 is 


For level of significance œ = .05, we require F(.95; 2, 6) = 5.14. Since F* = 11:2 > 5.14 
we conclude that the three schools differ in mean learning effects. The P-value of the test 
is ‚0094, 
Next is a test for differences in mean learning effects between instructors within each 
school. The alternatives are given in (26.182), and test statistic (26.1 8b) here is: 
189.17 


Е* = = 27.0 
7.00 


For œ = .05, we require F(.95; 3, 6) = 4.76. Since F* = 27.0 > 4.76, we conclude that 
instructors within at least one school differ in terms of mean learning effects. The P-value 
of this test is .0007. 


Comments 


1. The alternative Hy in (26.18a) can also be expressed in terms of the treatment means шү: 


Ho: ши = Шэ = +: = Hib Ha = Hm mcm эь... (26.19) 


In terms of the training school example. Ho states that the mean learning scores for all instructors 
in Atlanta are the same, and similarly for the other schools. It does nof state that the mean leaming 
scores for all instructors in the different schools are the same. 


2. If it is concluded that factor В effects are present, it is often desired to ascertain whether they 
are present in all levels of factor A or only in some. (In some cases, indeed, one may wish to proceed 
immediately to this analysis.) With reterence to the training school example, the question would be 
whether the instructor effects differ in all schools or only in some schools. As noted earlier, SSB(A) 
in Table 26.4a is made up of the instructor sums of squares within the individual schools. These 
component sums of squares can be used for testing instructor effects within each school. Table 26.4b 
contains the relevant component sums of squares. To test for instructor differences within the Atlanta 
school, for instance, we use test statistic F* = MSB(A,)/MSE = 210.25/7.00 = 30.0. For level of 
significance а = .05, we need F(.95; I, 6) = 5.99. Since F* = 30.0 > 5.99, we conclude that the two 
instructors in Atlanta have different mean learning effects. Using the same level of significance each 
time, similar conclusions are reached for the other two schools. The family level of significance for 
the three tests according to the Bonferroni inequality is at most .15. 

3. Ifthe assumption of constant error variance were violated in the training school example through 
unequal variances for the different schools, it would still be possible to study instructor effects within 
each school by separate analyses of variance for each school. 

4. The power of the tests for fixed factor A and factor В effects can be ascertained by using (24.49) 
together with the expected mean squares in Table 26.3. " 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1099 
Expected Mean Square 


A Fixed, B Random 


2 
er 


Mean Square A Random, B Random 


MSA о? Tm — no? c? + bno2 + no? 
MSB(A) o? + no? . | c? + nog 
MSE о? c? 
Appropriate Test Statistic 
Test for A Fixed, B Random A Random, B Random 
Factor A ` MSA/MSB(A) MSA/MSB( А) 
Factor B(A) MSB(A)/MSE MSB(A)/MSE 


Random Factor Effects 


Test statistic (26.17b) for factor A main effects is not appropriate if either or both factor 
effects are random. Table 26.5 gives the expected mean squares for these cases and also the 
appropriate test statistics. 


26.4 Evaluation of Appropriateness of Nested Design Model 


Example 


The diagnostic procedures described earlier are entirely applicable for examining whether 
nested design model (26.7) is appropriate. The residuals in (26.11): 


ерк = Yi — Y;. (26.20) 
may be examined as usual for normality, constancy of the error variance, and independence 
of the error terms. In particular, aligned dot plots of the residuals for each factor A level may 
be helpful in examining whether the variance of the error terms is constant for the different 
factor A levels within which factor B is nested. 


Figure 26.4a contains MINITAB aligned dot plots of the residuals for each school for the 
training school example. These plots are affected by the rounded nature of the data, but 
they support the appropriateness of the assumption of constancy of the error variance. Fig- 
ure 26.4b presents a normal probability plot of the residuals. This plot is also affected by 
the rounded nature of the observations, but does not indicate any gross departure from nor- 
mality. This conclusion is supported by the coefficient of correlation between the ordered 
residuals and their expected values under normality, which is .927. These and other diag- 
nostics (not shown here) support the appropriateness of nested design model (26.7) for the 
training school example. 


Comment 


Since there are numerous ties among the residuals in the training school example, the normal 
probability plot in Figure 26.4b is obtained by plotting each of the tied residuals against the ex- 
pected value for the mean of the tied order positions and showing the number of tied residuals at that 
position. m 


1100 Part Six Specialized Study Designs 


FIGURE 26.4 
MINITAB 
Diagnostic 
Residual 
Plots— 
Training 
School 
Example. 


(a) Residual Dot Plots 


. е е ° 
АПАМА 


е е е е 
$$} +++ CHICAGO 
Н ° 
— lS SANFRAN 
—2.0 -1.0 0.0 1.0 2.0 3.0 


(b) Normal Probability Plot 


Residual 


-3.2 -1.6 0.0 1.6 3.2 
Exp value 


26.5 Analysis of Factor Effects in Two-Factor Nested Designs 


When factor effects are present in a nested design, estimates and/or comparisons of these 
effects are usually desired. 


Estimation of Factor Level Means џ;. 


When factor A (fixed effects factor) has significant main effects, there is frequent interest 
in estimating the factor level means u.. The estimated factor level mean Y;.. is an unbiased 
estimator of w.. As usual for a fixed effects factor, the estimated variance of Y;.. is based 
on the mean square in the denominator of the statistic used for testing for factor A main 
effects, and on the number of cases on which Y;.. is based. Confidence limits for и. are of 
the customary form: 


Y;.. - t(1 — 0/2; df)s{¥;..} (26.21) 
where: А 
= MSE 
S; X.)- ue df --ab(n—1) A and B fixed (26.21a) 
= MSB(A 
sgy = BO арар 1) Абкей, В random (02621) 
п 


Confidence limits for contrasts L = У)с;ш., where у с; = 0, are set up in the usual 
way, utilizing the estimator Ё — GY.. and the t distribution with degrees of freedom 


Example 
Exampe — 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1101 


those associated with the appropriate mean square: 
LE -t(1— 0/2; df)st) (26.22) 
where: 
зЁ) = M dsMY.) аз given by (26.21a) or (26.21b) (26.22a) 


The Tukey and Bonferroni simultaneous comparison procedures can be utilized in the usual 
way for making pairwise comparisons with family confidence coefficient 1 — o, and the 
Scheffé and Bonferroni simultaneous comparison procedures can be employed for a family 
of contrasts. 


For the training school example in Table 26.1, it was desired to estimate the mean learning 
score for the Atlanta school with a 95 percent confidence coefficient. Using our earlier 
results in Tables 26.1 and 26.4a, we obtain for the fixed effects model: 


Y,.. = 19.75 
E MSE 7.00 
2 . 
wae = -1375 
nm bn 4 
s(Yi.] = 1.32 


1(.975;6) = 2.447 
16.5 = 19.75 — 2.447(1.32) € ш. < 19.75 + 2.447(1.32) = 23.0 


In addition, pairwise comparisons of the three schools were to be made with family 
confidence coefficient .90. We shall utilize the Tukey procedure and require: 


1 1 
/2. A2 
The estimated variance is the same for all pairwise comparisons: 

- MSE | MSE  2(7.00) 
2 
L = = == 
cl ET 4 


so that the estimated standard deviation is s{Î} = 1.87 and Ts{Î} = 2.52(1.87) = 4.71. 
Using the results in Table 26.1, we have: 


TS 540 — о;а, ab(n — 1)] = —9(.90;3, 6) = (3.56) = 2.52 


3.5 


Yi.21975 №..=1425 Yz.. = 11.00 
Hence, the 90 percent family of confidence intervals 15: 
.8 = (19.75 — 14.25) — 4.71 € ш. — ua. < (19.75 — 14.25) + 4.71 = 10.2 
4.0 = (19.75 — 11.00) — 4.71 < ш. — из. < (19.75 — 11.00) + 4.71 = 13.5 
—L5 = (14.25 — 11.00) — 4.71 € u2. — из. < (14.25 — 11.00) + 4.71 = 8.0 


We conclude with 90 percent family confidence coefficient that the mean learning score 
is highest in Atlanta and that the difference in the observed mean scores for Chicago and 
San Francisco is not statistically significant. We summarize these results by the following 


1102 PartSix Specialized Study Designs 


line plot: 
San Francisco Chicago Atlanta 
eo —e —e4— 
10 E RENE: C 20 


learning Score 


Estimation of Treatment Means и;; 


Example 


Confidence limits for 14; are set up in the usual fashion using the f distribution When both 
factors A and B have fixed effects: 


К. tl — 0/2; (n — V)abls(Y;y.] (26.23) 
where: 


ag 1 MSE 


n 


(26.23a) 


To make a comparison within any factor A level, we estimate the contrast L = ^c; m 
where Усу = 0, with the estimator Ê = c; Y;;. and employ the confidence limits: 


E +1 —o/2;( — ар) (26.24) 


where: 


qus MSE 2 
s{L}=— y (26.24a) 

The Bonferroni procedure may be used when several comparisons are to be made and 
the family confidence level is to be controlled. The Tukey procedure is also applicable for 
paired comparisons and the Schetfé procedure for contrasts, but these procedures often will 
not be efficient since ordinarily only comparisons within each factor level are of interest, 
whereas the Tukey and Schetfé families are based on comparisons among all ab treatments. 


[n the training school example, we are to compare the mean scores for the two instructors in 
each school, using the Bonferroni procedure with a 90 percent family confidence coefficient. 
For g = 3 comparisons, we require В = [1 — .10/2(3); 6] = 1(.983; 6) = 2.748. The 
estimated variance in each case is: 


„7.00 
si = 6—0) = 7.0 


Hence, Bs{L} = 2.7484/7.0 = 7.27. Obtaining the estimated treatment means Y;;. from 
Table 26.1, we find: 
7.2 = (27 — 12.5) – 7.27 € ui — илз € Q7 ~ 12.5) + 7.27 = 21.8 
—18.8 = (8.5 — 20) — 7.27 < ш — ui; € (8.5 — 20) + 7.27 = —42 
7.7 = (185 —3.5) — 7.27 € ua — ио < (18.5 — 3.5) + 7.27 = 22.3 


It is evident that substantial differences between the two instructors exist at each school. 


$ Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1103 


stimation of Overall Mean џ,, 

Ы Sometimes there is interest in estimating the overall mean u... For the training school 
example, u.. is the overall mean learning score for all training schools and all instructors 
in these schools. The point estimator is Y... The confidence limits are constructed utilizing 
the ¢ distribution as follows: 


È.. E t(1 — 0/2; df )s(X..] (26.25) 

` where: 
: EA М2Е Ф = аып 1) A and В fixed (26.25а) 
| AY} = msa df=a-1 А апа B random (26.250) 
SY.) = МЭВ e df = a(b — 1) A fixed, B random — (26.25c) 


For the training school example, we wish to estimate the overall mean u.. with a 95 percent 
confidence interval. The estimated variance (26.252) is appropriate here since the model 
involves fixed factor effects. Hence, we obtain: 


sx.)- 120 = .583  s(Y.] = .764 


For confidence coefficient .95, we require t(.975;6) = 2.447. From Table 26.1, we find 
Ү.. = 15. The desired confidence interval therefore is: 


13.1 = 15 — 2.447(.764) < џ.. < 15 + 2.447(.764) = 16.9 


Estimation of Variance Components 
With random factor effects, estimates of the variance components may be of interest. No 
new problems arise for balanced nested designs. For instance, we see from Table 26.5 that 
when both factors A and B are random factors, the variance component o2 can be expressed 


as follows: 
E{MSA} — E{MSB(A 
өз — EMSA) — E(MSB(A)) or 
bn 
Hence, an unbiased estimator of c2 is: 
MSA — MSB(A 
s em mcn (26.27) 


bn 


Approximate confidence intervals for variance components o? or оў can be obtained 
using the MLS interval (25.34). For example, to estimate o2 when both A and B are 
random factors, we see from (26.26) that the correspondences to (25.32) are: 


с = — MS, = МЅА 


со = ——— М5, = МӘВ(А) 


1104 Part Six Specialized Study Designs 


Hence, the MLS confidence interval for o2 is: 


52 — Н, < сг < s, + Hu (26.28) 
where H; and Hy are given by the formulas in Table 25.3, df, = a — 1, df, = a(b — 1) 
and s? is given by (26.27). i 


26.6 Unbalanced Nested Two-Factor Designs 


Uptothis point. we have assumed that the nested study is balanced; that is, the same number 
of levels of factor B 1s nested within each of the levels of factor A, and the same number of 
replications is made for each factor level combination. There are occasions, however, when 
a study is unbalanced. For instance, in our earlier example dealing with the effects of schoo] 
(factor А) and instructor (factor B) on the learning achieved by classes of mechanics, there 
might have been b; instructors in the ith school and и; ; classes taught by the jth instructor 
in school i. 

The ANOVA sums of squares formulas given earlier are not appropriate for unbalanced 
studies. Ordinarily, it is best to use the regression approach for unbalanced studies when 
the factor effects are fixed. Since no new principles are involved, we proceed directly to an 
example. 


The manufacturing company that conducted the training school study subsequently made 
a follow-up study involving only Atlanta and Chicago. At that time, three instructors were 
used in Atlanta and two in Chicago. All instructors were to train two classes, but one class 
for one of the instructors in Atlanta had to be canceled. The data for this follow-up study 
are presented in Table 26.6a. We shall again assume that a fixed effects nested design model 
is appropriate: 


Example 


Yijk = Me + 0 + Bjo) + Eijk (26.29) 
i=1,2;j=1,..., bik = 1,..., nij 
bi =3, b =2 п =тз=2, np=l, ny» = пә = 2 


2 3 2 
Уу ==0  MBo-20 $X pow 
i=l j=l j=l 


Proceeding as usual, we shall incorporate the parameters ол, 81 (1), 621), and Во) into the 
regression model. The other parameters are not required since according to the constraints 
in (26.29) we have: 


оз = ~Q] Вза) = = Віа) — Вр Во) = — Во)" (26.30) 


Thus, we require four indicator variables for our example, each taking on values 1, --1, ог0. 
The equivalent regression model therefore is: 


Хк = n. + 0 Ха + Ви) Хо + Ра) Xii + Во) Xia + Eijk Full model 
мч Ne а 


School main Specific instructor. within 
effect School effect 


(26.31) 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1105 


TABLE 26.6 


Nested ta 
Unbalanced Atlanta Chicago 
Two-Factor Study Replication — S. ) (42) z 
ч k B в B в В 
Follow-up 

Training 1 20 8 9 4 16 
School Study. 2 22 13 8 20 


(b) Y and X Variables for Regression Approach 
| 0 о o oo Ө 


i j k Y Xx X2 Хз X4 
1 1 1 20 1 1 0 0 
1 1 2 22 1 1 0 0 
1 2 1 8 1 0 1 Ü 
1 3 1 9 1 —1 —1 0 
1 3 2 13 1 —1 —1 0 
2 1 1 4 ~1 0 0 1 
2 1 2 8 —1 0. 0 1 
2 2 ‚1 16 -1 0 0 -1 
2 2 2 20 -1 0 0 —1 
where: 


if class from school 1 
if class from school 2 


if class for instructor 3 in school 1 
otherwise 


1 

1 

1 if class for instructor 1 in school 1 
X;,—4-1 

0 

1 if class for instructor 2 in school 1 
1 if class for instructor 3 in school 1 
0 otherwise 

1 if class for instructor 1 in school 2 
1 if class for instructor 2 in school 2 
Q otherwise 


X, = == 


The Y observations and X indicator variables for this example аге shown in Table 26.60. 

To test for school main effects, we first fit full model (26.31) by regressing Y in 
Table 26.6b, column 1, on X,, X5, X5, Ха in columns 2—5, and obtain SSE(F). We then fit 
the reduced model for Ho: ол = 0: 


Yi = u.. + Bia Xij + Bra) Хз + Fio Xia + Eijk Reduced model (26.32) 


by regressing Y in column 1 on Хэ, Xs, Ха in columns 3-5, and obtain SSE(R). The 
difference SSE(R) — SSE(F) equals SSA. Test statistic (2.70) is then obtained in the usual 
fashion. 


1106 PartSix Specialized Study Designs "s 


TABLE 26.7 ANOVA Table for Nested Unbalanced Two-Factor 
Study— Follow-up Training School Study. 


Source of 

Variation SS df MS F* 
Schools (А) 3.76 1 3.76 3.76/6.5 = .58 
Instructors [B(A)] 295.20 3 98.4 98.4/6.5 = 15.1 
Error (E) 26.00 4 6.5 


To test for specific instructor effects, we employ the reduced model for Ho: В 
> Big) = 
Pan = Bia = О: M 


Ү, jx = u. + QX ijkl + Eijk Reduced model (26.33) 


We therefore regress Y in column | on X; in column 2, and obtain SSE(R). The difference 
SSE(R) — SSE(F) equals SSB(A). 

Table 26.7 contains the ANOVA table for the follow-up training school study. No tota] 
sum of squares is shown because the component sums of squares are not orthogonal. 

The tests for school and instructor effects are carried out as before. Estimation of factor 
effects is done by means of the regression parameters. For instance, a comparison of the 
mean scores for the two schools involves: 

Ш. — Ha. = A) — Ag 
Since o; = —о by (26.30), we need to estimate: 


Hi- — иэ. = 04 — C701) = 201 


An unbiased estimator is 2@,. Other desired estimates are obtained in a similar fashion. 


26.7 Subsampling in Single-Factor Study with Completely 
Randomized Design 


Up to this point in our discussion of experimental designs, we have considered only designs 
in which one observation of the response variable is made on an experimental unit. There are 
occasions, however, when more than one observation is desirable. Consider an experiment 
to study the effect of oven temperature on crustiness of bread. Three temperatures were 
utilized, and two experimental units (batches of flour mix) were randomly assigned to each 
treatment. It was not economical to use the entire batch to bake breads, nor was it technically 
feasible to use a batch as a block. Hence, three subsamples were selected from eagh batch to 
make three loaves, which were baked at a given temperature. Here, then, three observations 
(subsamples) were made on each experimental unit (batch). 

Another instance of several observations on the response variable being made for each 
experimental unit occurred in an experiment on the effectiveness of three different training 
methods. The experimental units here were persons, and the experiment sought to mea- 
sure the length of time required to perform a certain engine assembly operation after the 
given training program was completed. Ten consecutive assemblies were timed, and these 
constituted the subsamples of the experimental unit (person). 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1107 


Formally, subsampling (1.е., repeated observations on the same experimental unit) is 
completely analogous to nested factors. We shall demonstrate this for a completely ran- 
domized design. 


Consider again the experiment to study the effect of oven temperature on the crustiness of 
bread. The model for this study can be written as follows: 


Yi = ш. + T+ Eja + тук (26.34) 
The meaning of the symbols is as follows: 


1. u.. is an overall constant. 

2. т; is the temperature (1.е., treatment) effect (fixed effect, here). 

3. єз is the experimental error associated with the particular batch (random effect, here). 
The experimental error is nested within the treatment, since the jth batch for treatment 
i was not used with any other treatment. 

4. nijx is the error associated with the kth subsample or observation on the jth experimental 
unit for the ith treatment (random effect, here). 


Note that subsampling model (26.34) appears the same as nested design model (26.7) 
for a nested two-factor design, except for changes in notation to reflect the fact that subsam- 
pling model (26.34) is a single-factor model and contains both experimental and observation 
errors. Specifically, the treatment effect т; here corresponds to o; in the nested two-factor 
model, the batch effect є у corresponds to jg), and the observation error term hijg cor- 
responds to є; зк. Consequently, the analysis of variance for the case of subsampling in а 
single-factor study with a completely randomized design parallels that for a nested two- 
factor study. 


In general, the model for subsampling in a balanced single-factor study with a completely 
randomized design where the treatment effects are fixed 15: 
Үк = Me + Ti + буду + тук (26.35) 


where: 


ш.. is a constant 

ту are constants subject to the restriction $t; = 0 
Eja are independent N (0, o?) 

тйк are independent N (0, o7) 

£j) and трук are independent 
f=1,....r;j=1,...,n;k=1,....m 


The mean and variance of observation Y;;; for this model are: 
Ек = wo +t (26.36a) 
с 07 =0° +0; (26.36b) 


Further, the observations Ё; are normally distributed for this model. Observations from 
different replications (i.e., from different subsamples) are independent, but any two 


1108 PartSix Specialized Study Designs 


Observations from the same replication are correlated in advance of the random trials because 
they contain the same random term & г): 


o (Yi. Yin) = o? k fx k (26.36c) 
olY;. Yi py] =0 i Æ i and/or j fx y (26.36d) 


Analysis of Variance and Tests of Effects 


The appropriate sums of squares for the analysis of variance for balanced subsampling 
model (26.35) are as follows: 


5570 = у у Oi- Y (26.37a) 
i i k 


SSTR = пт У (Y... — Y. (26.37b) 
SSEE = m X Yj. – ү.) (26.37c) 
i j 
SSOE =X у у Yin- 1.) (26.379) 
i i k 


Here, SSEE stands for the experimental error sum of squares, and SSOE stands for the 
observation error sum of squares. Note the correspondence of formulas (26.37) to formulas 
(26.13) for nested two-factor designs. The only difference is that we now have = 1, ...,r, 
j=l,....n,andk = 1,..., т, whereas before i, j, and k ran to a, b, and n, respectively. 

Table 26.8 contains the ANOVA for a single-factor completely randomized balanced 
experiment with subsampling. Also shown there are the expected mean squares for both 
fixed and random treatment effects. Note that regardless of whether treatment effects are 
fixed or random, the appropriate statistic for testing treatment effects is: 


BS oe (26.38a) 
MSEE 


TABLE 26.8 ANOVA for Single-Factor Completely Randomized Balanced Experiment with Subsampling. 


Source of ETMS} 

Variation 55 df MS 7; Fixed 7; Random 
Treatments SSTR r—1 MSTR c2 + mo? + met о + mo? + nmo? 
Experimental error SSEE r(n— 1) MSEE оў + то? оў + то? 
Observation error SSOE m(m — 1) MSOE c? c? 


D] n 


Total SSTO mm-—1 


ns 
Example 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1109 


A test for the presence of experimental error effects, i.e., Ho: ©? = 0, Ha: о? > 0, also uses 
the same test statistic for both fixed and random treatment effects: 


_ MSEE 
"^ MSOE 


The data for the study of the effects of baking temperature on the crustiness of bread are 
contained in Table 26.9. The data are scores on a scale from 1 to 20. Figure 26.5 presents 
SYSTAT aligned dot plots of the data. These plots suggest the presence of temperature 
effects and possibly also batch effects. Note that crustiness increases steadily with the level 
of temperature. 

The appropriate analysis of variance was obtained from a computer run and is presented 
in Table 26.10. To test for temperature effects: 


ж 


(26.38Ь) 


Не: тү = ту) = 1 =0 


Ha: not all т; equal zero 


we use test statistic (26.382): 


TABLE 26.9 Data for Single-Factor Completely Randomized Balanced Experiment with 


nu 


Observation 


Unit. 
k 


WN = 


FIGURE 26.5 
SYSTAT Dot 
Plots for 
Subsampling 
Experiment— 
Bread 
Crustiness 
Example. 


Subsampling—Bread Crustiness Example. 


| . Temperature _ 
Low.(j = 1) Medium (= 2) - 

Batch 1 ‘Batch 2 . Batch 3 Batch 4 | Batch 6- 
j=1 je2 ^ fat pee j=2; 
4 12 14 9 16 
7 8 13 10 19 
5 10 11 12 18 


High (i = 3) 


Medium (i — 2) 


Temperature 


Low (i = 1) 


0 5 10 15 20 
Crustiness 


1110 PartSix Specialized Study Designs 


TABLE 26.10 
ANOVA— 
Bread 
Crustiness 
Example. 


Source of 

Variation SS df MS 
Temperatures (TR) 235.44 2 117.72 
Mix batches (ЕЕ) 49.00 3 16.33 
Observation units (OE) 31.33 12 2.61 
Total 315.78 17 


For level of significance о = .10, we need F(.90; 2, 3) = 5.46. Since F* = 721 55 46, 
we conclude Ни. that baking temperature does have an effect on the crustiness of the bread. 
The P-value of the test is .07. 
To test for batch differences: 

Ну: о? = 0 

Но? > 0 
we employ test statistic (26.386): 

_ 16.33 

~ 2.61 
For level of significance о = .10, we need F(.90;3, 12) = 2.61. Since F* = 6.26 > 2.61, 
we conclude H,, that there are batch effects on the crustiness of bread. The P-value of this 
test is .01. Thus. both the particular batch of flour mix and the temperature at which the 
bread is baked affect the crustiness of the loaf. 


= 626 


Estimation of Treatment Effects 


When the treatment effects are fixed, there is usually interest in obtaining confidence in- 
tervals for treatment means Hi. = џ.. + т; and for pairwise comparisons and contrasts of 
the treatment means. These can be obtained in the usual manner, using MSEE as the error 
variance since this is the quantity in the denominator of the test statistic for fixed treatment 
effects. The degrees of freedom are those associated with MSEE, namely, (n — 1)r. For 
instance, the confidence limits for treatment mean u. are: 


Y... E i1 — 0/2; (n — Urls(Y,..] (26.39) 


where: 


(26.39a) 


Similarly, confidence limits for a contrast of treatment means. L = $` c; u,., where? с; = 0, 
are obtained as follows: H 


L x i1 0/2; (n — Dr]s(£] (26.40) 
where: 


зб = ae (26.406) 


пт ; 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1111 


The Bonferroni, Tukey, and Scheffé simultaneous inference procedures can be utilized in 
the usual manner. 


In the bread crustiness example, we wish to estimate the mean crustiness of bread baked at 
a low temperature with a 95 percent confidence coefficient. We require, using the results in 
Tables 26.9 and 26.10: 


Example 
Example — 


Y,. = 7.67 


А. 16.33 z 
547.) = = =2722 =.) = 165 


1(.975;3) = 3.182 
Hence, the 95 percent confidence interval 1s: 
2.4 = 7.67 — 3.182(1.65) < ш. < 7.67 + 3.182(1.65) = 12.9 


It was also desired to estimate the difference in mean crustiness of bread baked at high 
and low temperatures with a 95 percent confidence interval. Utilizing (26.40) and the results 
in Tables 26.9 and 26.10, we obtain: 


Ү..= 7.67 }.. = 16.5 
Ê = ү. Yi. = 16.5 — 7.67 = 8.83 


E 28 


Hence, the desired confidence interval is: 
1.4 = 8.83 — 3.182(2.33) < из. — ш. < 8.83 + 3.182(2.33) = 16.2 


= 5.443  s(L)—2.33 


Estimation of Variances 
At times, there is interest in estimating c?, the experimental error variance, and o2, the 
observation error variance. It is evident from either of the E{MS} columns in Table 26.8 
that the following are unbiased estimators: 


Parameter Unbiased Estimator 
o? QD ME (26.412) 
о 52 = MSOE Ж (26.416) 
An approximate confidence interval for the experimental error variance c? is easily 


obtained by the modified large sample procedure in (25.34). From Table 26.8, we have: 
,  E(MSEE| — E{MSOE} 
С = — c 


m 
Thus с? takes the form (25.32) with correspondences: 
1 
C, = — MS, = MSEE 
m 


1 
су=—— MS; — MSOE 
m 


1112 PartSix Specialized Study Desigus 


Example 


E 


The MLS approximate | — о confidence interval for o? is therefore: 


S—H,zxo xs + Ну (26.42) 
where H; and ne are given by the formulas in Table 25.3, dfi = r(n — 1) ang df, = 
rn(m — 1), and s? is given in (26.41a). ix 

An exact confidence interval for the observation error variance оў can be Obtained by 
(25.21), with MSOE now being the mean square and rn(m — 1) now being the degrees of 
freedom. 


For the bread crustiness example, we wish to estimate o^, the variability between batches, 
with a 95 percent confidence interval. From Table 26.10, we obtain the point estimate: 
16.33 — 2.61 


57 = = 4.57 
3 


19 


To obtain an approximate 95 percent confidence interval for o? using (26.42), we need the 
following calculational resuits for the formulas in Table 25.3: 


3.12 Б = 1.95 F3213.92 Fy=2.73 Е; = 4.47 Е, = 14.34 
Сі = .6795 Сэ = .4872 С; = —.0397 С; = —2.6347 
= 3.97 Hy = 70.24 


The desired confidence interval for o? is therefore: 
.60 = 4.57 — 3.97 < o? < 4.57 + 70.24 = 74.81 


and for с, the experimental error standard deviation, the confidence interval is: 


77 € o < 8.65 


Comments 


і. Frequently, the units for subsampling are called observation units. to distinguish them from 
the experimental units. For instance, in the bread crustiness example, the batches of flour mix are 
the experimental units and the portions selected from a batch for making loaves of bread are the 
Observation units. 

2. Observation units may be different physical entities, as in the bread crustiness example where 
they are portions of a batch of flour mix. Observation units also may refer to repeated observations 
on the entire experimental unit. An example of the latter is the earlier illustration where an employee 
is timed for 10 consecutive assembly operations after receiving a given type of training. 

3. Note that subsampling model (26.35) contains no interaction terms. This 1$ because the exper- 
imental error terms є у; are nested within treatments. When one variable is nested within another, we 
saw earlier that interaction terms are inapplicable. 

4. We have considered only the balanced case for subsampling. where an equal number of ex- 
perimental units (n) are applied to each treatment and a constant number of observations (m) are 
made on each experimental unit. Serious complications are encountered in the unbalanced case, and 
no exact test for treatment effects can be made. See an advanced text, such as Reference 26.1, fora 
discussion, и 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1113 


26.8 Pure Subsampling in Three Stages 


Моде! 


Sometimes an investigation does not involve a comparison of treatments, but only subsam- 
pling at several levels. Consider, for instance, a quality control engineer who wishes to 
investigate a certain quality characteristic of a computer assembly. These assemblies are 
produced in lots of 2,000. The engineer will select a random sample of r lots; from each lot 
n assemblies will be selected, and m observations will be made on the quality characteristic 
for each assembly. 


Assuming that all random variables are normally distributed and that equal sample sizes 
are employed at each stage, the model for subsampling in three stages is: 


Yijk = и. A V + Ejay + jk (26.43) 
where: 


u.. iS a constant 

Ti, Езу, and 7; ук are independent normal random variables with expectations 0 and 
variances o?, с?, and ož, respectively 

b=1,..477f = 1s. nik =1,...,m 


For our quality control illustration, т; represents the lot effect, €; g) represents the assembly 
effect that is nested within the lot, and э: зк represents the observation effect. 

The observations Y;;, for subsampling model (26.43) are normally distributed, with mean 
and variance: 


E(Yij) = и. (26.44а) 
сҮ) = oy 502 o? + оў (26.44b) 


Various correlations exist between two observations from the same lot. 

Subsampling model (26.43) corresponds to subsampling model (26.35) for a single-factor 
study except that we assume here that the 7; are independent N (0, c?) and are independent 
of the € ja) and эк. Formally, then, the only difference between models (26.35) and (26.43) 
is that the т; are fixed in one case and random in the other. Subsampling model (26.43) also 
corresponds to nested model (26.7) with both factor A and factor B effects random. 


Analysis of Variance 


The analysis of variance for pure subsampling model (26.43) uses the same sums of squares 
as before, namely, those in (26.37). The ANOVA table is the same as that in Table 26.8. 
The applicable expected mean squares are those for random т; effects. 


Estimation of u.. 


In the case of pure subsampling, there is often interest in estimating the overall mean u.. 
(the process mean for the computer assembly quality characteristic in our earlier quality 
control example). A point estimator of u.. in model (26.43) is X.., and it can be shown that 


1114 Part Six Specialized Study Designs 


its variance is: 


jue о; о? о; nmo? + то? + e. 
Cd r t rn i mm ram (26.45) 
An unbiased estimator of this variance is: 
sz. MSTR 
а rni (26.46) 


and the 1 — о confidence limits for џ.. are: 


Y. E t(0—aJ2;r — 1)5{Ў..} (26.47) 


26.9 "Dhrec-Factor Partially Nested Designs 


Our discussion of nested designs and subsampling so far has been confined to hierarchical 
designs where no factors are crossed. In this section, we consider three-factor experiments 
where some but not all of the factors are nested. Such designs are called partially nested, 
partially hierarchical, or cross-nested designs. We shall utilize the following example to 
explain three-factor partially nested designs. 


Example _ The effect of cultural background on group decision making was studied by an experiment. 

— — — ——— Sixteen teams of students were formed and assigned a task. One of the response variables 
was the number of group interactions prior to the final group decision. Eight teams consisted 
of foreign students, eight of U.S. students. Half of the teams consisted of eight members, 
the other half of four members. Two foreign observers were used for the foreign teams, and 
two U.S. observers for the U.S. teams. Thus, the design may be represented as follows: 


U.S. Teams (Ал) Foreign Teams (Аз) 
Observer 1 (СІ) Observer 2 (C2) Observer 3 (C4) Observer 4 (C3) 
Small team Replication 1 Replication 1 Replication 1 Replication 1 
(B1) Replication 2 Replication 2 Replication 2 Replication 2 
Large team Replication 1 Replication 1 Replication 1 Replication 1 
(B5) Replication 2 Replication 2 Replication 2 Replication 2 


Note that there are two replications (teams) in each cell. 


Development of Model А 


Let nationality of team be factor A, size of team factor В, and observer factor С. Note that 
factor C is nested within factor A since the two observers for the U.S. teams were different 
from the two observers for the foreign teams. Also note that factors А and B are crossed, 
since each level of factor А appears with every level of factor B, and vice versa. Similarly, 
factors B and C are crossed. Factors А (nationality) and B (team size) were considered to 
have fixed effects, while the factor C (observer) effects were considered to be random. 

In order to develop an appropriate model, we need to recognize that factor C is nested 
within factor A; hence the factor C effect is denoted by yzg). We also need to recognize that 
the AC and ABC interactions are to be excluded because factor C is nested within factor A. 
Finally. the BC interaction is nested within factor А since factor C is nested within factor А; 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1115 


thus, the BC interaction is denoted by (Ву) у. Hence, the appropriate model is: 
Үрп = H- + 0 + Bj + уц + (еб) + (BY) jq + Eijem (26.48) 


where: 


р... is an overall constant 

a; are the fixed nationality effects 

В; are the fixed team size effects 

yq) are the random observer (within nationality) effects 

(0/8);; are the fixed nationality-team size interaction effects 

(BY) jq) are the random team size—observer interaction (within nationality) effects 
Eijkm are random error terms 


25€ =O  »j;8;—-0 Jop =0 гай} 
$5, (08); =0 for alli (BY) jx =0 forall kG) 
i=l,....aj=1,...,bk=1,...,c3m=1,...,n 


Appendix D contains a simple rule for constructing ANOVA models for complex designs, 
such as the one here. 

We assume as usual that уко, (BY) jka)» and &;j4,, are normally distributed with expec- 
tations zero and with constant variances оў, сву» and c?, respectively, and that the three 
groups of random variables are pairwise independent. The interaction effects (By) уку for 
any given observer are correlated as a result of the restrictions in model (26.48). 


Analysis of Variance 


Example _ 


Table 26.11 contains the ANOVA table for model (26.48). The sums of squares, degrees 
of freedom, and expected mean squares shown in this table can be developed by using the 
rules in Appendix D. The expected mean squares also can be obtained from some computer 
packages with analysis of variance capabilities. The expected mean squares column in 
Table 26.11 indicates directly how to form test statistics for a variety of tests. 


Table 26.12 contains the results of the group decision-making experiment described earlier, 
and Figure 26.6 presents SYSTAT aligned dot plots of the data. The dot plots suggest a 
strong effect of nationality on the number of group interactions before the group decision 
is reached. Figure 26.7 contains the MINITAB printout of the ANOVA results, including 
the expected mean squares and the appropriate F tests. The correspondences between the 
symbols used in MINITAB in its expected mean square column and the model terms in 
Table 26.11 are as follows: Each term in an expected mean square is represented in the 
MINITAB output by (1) the numeric code, in parentheses, for the variance of the model 
term, and (2) the preceding number which is the numerical multiple. When the model effect 
is fixed, the letter О is used in the printout to show that the variance of the model term 15 
replaced by the sum of squared effects divided by degrees of freedom. For example: 


Ler 


E{MSA} = (6) + 4(3) + 8Q[1] = о? + 402 + 82 


E{MSBC(A)} = (6) + 2(5) = о? +202, 


- 
- 
- 
е 


TABLE 26.11 ANOVA Table for Crossed-Nested Model (26.48). 


Source of 
Variation $$ df MS Expected Mean Squares 
FEES Уер 
А SSA = bcn (Yi... — Y... а-1 MSA o? + ben~ + bno? 
v ү y b г 0 ў 
В SSB = acny XY j.. — Y...) -1 MSB o? + аспат + nog, 
С(А) SSC(A) = МУУ. – ү...) a(c — 1) MSC(A) o? + bno? 
“КЕТУ? ; DIQ а 
АВ SSAB = cny 3 (Y. - Yrs — Yje + Y...) (a — 1Yp- 1) MSAB ос + eta - 1b —1) + nog, 
BC(A) SSBC(A) = nY S S Xie. — Yije — Yid. + Yi. a(b — 1y(c — 1) MSBC(A) о? + поў, 
Error SSE = Y S gm — Yi Y? abc(n — 1) MSE o? 


Total SSTO = УУУУ (Үшт-— Y. abcn — 1 


'FIGURE 26.6 
SYSTAT Dot 
Plots for 
‘Crossed-Nested 
Design 
Éxperiment— 
‘Group 
Decision- 
‘Making 
Example. 


FIGURE 26.7 
MINITAB 
Output for 
Crossed-Nested 
Design 
Experiment — 
Group 
Decision- 
Making 
Example. 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1117 


. .USTeams(- 1) foreign'Teams(i = 2) 
| : Observer Т Observer.2. Observer $ Observer4 
Size of Team (k=) (k= 2) (k=1) (k= 2) 
4 members 16 14 7 4 
G-1 20 19 5 9 
8. members 21 28 11 12 
-2 .25 19 17 15 


E 8 = жешс 00 © U.S. Teams (i = 1), Observer 1 (К = 1) 
= п U.S. Teams (i = 1), Observer 2 (k = 2) 
S * Foreign Teams (i = 2), Observer 3 (k = 1) 
a и Foreign Teams (i = 2), Observer 4 (k = 2) 
0 10 20 30 
Number of Group Interactions 
Analysis of Variance 

Source DF $$ MS F P 
A 1 420.25 420.25 1681.00 0.001 
B 1 182.25 182.25 145.80 0.007 
C(A) 2 0.50 0.25 0.02 0.981 
A*B 1 2.25 2.25 1.80 0.312 
B*C(A) 2 2.50 1.25 0.09 0.911 
Error 8 106.00 13.25 
Total 15 713.75 

Variance Error Expected Mean Square 
Source component term (using restricted model) 
1A 3 (6) + 4(3) + 8Q[1] 
2B 5 (6) + 2(5) + 8Q[2] 
3 C(A) —3.250 6 (6) + 4(3) 
4 A*B 5 (6) + 2(5) + 4Q[4] 
5 B*C(A) —6.000 6 (6) + 2(5) 


6 Error 13.250 (6) 


1118 Part Six Specialized Study Designs 


To test for nationality effects, the alternatives are: 


Ho: a, = Оэ = 0 
Н: not both о; equal zero (26.493) 


Table 26.11 indicates that the appropriate test statistic is: 


._ MSA 
= MSCA) (26.49b) 


We have for our example, using the results in Figure 26.7: 


_ 42025 
EXE 


For level of significance w = .05, we require F(.95; 1, 2) = 18.5. Since F* = 1,681 > 18.5, 
we conclude H,, that nationality has an effect on the group behavior. The P-value of the 
test is 001. Other tests are conducted in a similar fashion. Results are summarized in 
Figure 26.7. 

Next, we wish to estimate the difference between U.S. and foreign teams in the mean 
number of group interactions prior to a decision. Confidence intervals for contrasts of main 
factor effects are set up in the usual way when the factor effects are fixed. Hence, we 
require MSC(A), as this is the mean square used in the denominator of the test statistic for 
examining nationality effects. Specifically, the confidence limits for L = u1.. — H2.. are: 


Ê +11 0/2; (c = Па} Ё} (26.50) 


ж 


= 1,681 


where: 


va, 2MSC(A 
Ст рл ыш. (26.50а) 


nbc 


For our example, we obtain from Table 26.12 and Figure 26.7: 


Y.—2025 Y»... = 10.00 Ё = 20.25 — 10.00 = 10.25 
ar 2(.25 ^ 
s(Ly= zd —.063 {Ё} =.25 
For confidence coefficient .95, we require г(.975; 2) = 4.303. The confidence limits then 
are 10.25 + 4.303(.25). and the desired 95 percent confidence interval is: 


9.2 < Hie — H2.. < 11.3 


With confidence coefficient .95, we conclude that U.S. teams engage in 9.2 to 11.3 more 
interactions. on average, than foreign teams before a group decision is reached. 


LÀ 


Comments 


|. The sums of squares SSA, SSB, and SSAB in Table 26.11 for the analysis of the crossed-nested 
experimental design are the usual sums of squares for factor A main effects, factor B main effects, 
and AB interactions. SSC(A) simply measures the variability of the factor C level estimated means 
for any given level of factor A, and then aggregates these sums of squares over factor A. Similarly, 
SSBC(A) contains the usual BC interaction sum of squares for a given level of factor A. and then 
aggregates thesc sums of squares over factor A. 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1119 


2. If important AB interactions are present, analysis should usually focus on the means р. when 
the factors have fixed effects, rather than on the factor level means 4.. and џи... It can be shown that 
the estimated variance for comparing the two team sizes for any given nationality 15: 

z = 2MSBC(A) 
sYa — Yo) = — ——— (26.51) 
cn 
This variance has associated with it a(b— 1)(c — 1) degrees of freedom, as is evident from Table 26.11. 

No exact confidence interval exists for comparing the two nationalities for any given team size. 

An unbiased variance estimator that can be utilized is: 
2 


УИ. Yj} = — LS + 


MSC(A) — d (26.52) 
сп 


Ь 


The approximate number of degrees of freedom associated with this variance is obtained from (25.28). 

The reason for the different variances in (26.51) and (26.52) is that the observers are the same 
when the two team sizes for a given nationality are compared, while the observers differ when the 
two nationalities for a given team size are compared. ш 


Cited 26.1. Searle, S. R. Linear Models for Unbalanced Data. New York: John Wiley & Sons, 1987. 
Reference 
Problems 26.1. A student asked: "Since the mean squares in the analysis of variance table for a two-factor 


nested desigri are the same whether the factor effects are assumed to be random or fixed, 
what difference does it make whether we assume the factors to have fixed effects or random 
effects?" Comment. 

26.2. A researcher declared: “I prefer analyzing a nested two-factor study as a study with crossed 
factors because I can isolate more sources of variation.” Comment on the researcher's strategy. 


26.3. Consider a three-factor study where factor C is nested within factor B, and factor B in turn 
is nested within factor A, anda = b = c = 2. Mlustrate in the format of Figure 26.1 the 
distinction between this nested design and the corresponding crossed design. 

26.4. Bottling plant production. A production engineer studied the effects of machine model 
(factor A) and operator (factor B) on the output in a bottling plant. Three bottling machines 
were used, each a different model. Twelve operators were employed. Four operators were 
assigned to a machine and worked six-hour shifts each. Data on the number of cases produced 
by each machine and operator were collected for a week. The data that follow represent the 
number of cases produced per hour for each day during the week. 


Machine i: 1 2 3 
Operator j: 1 2 3 4 
Day kz 1: 65 68 56 45 74 69 52 73 | 69 63 81 67 


= 2: 58 62 65 56 81 76 56 78 83 70 72 79 
k= 3: 63 75 58 54 76 80 62 83 74 72 73 73 
k= 4: 57 64 70 48 80 78 58 75 78 68 76 77 
k=5: 66 70 64 60 68 73 51 76 80 75 70 71 


a. Obtain the residuals for nested design model (26.7) with fixed factor effects and plot them 
against the fitted values. Also prepare a normal probability plot of the residuals. What are 
your findings about the appropriateness of model (26.7)? 


1120 Part Six Specialized Study Designs 


b. Prepare aligned residual dot plots by machinc. Do these plots support the assumpti 
constancy of the error variance? Discuss. ption of 
26.5. Refer to Bottling plant production Problem 26.4. Assume that nested design model Q63 
7) 


26.6. 


26.7. 


with fixed factor effects is appropriate. 
а, 


b. 


Can the operator effects be distinguished from the effects of shifts in this study? Discu. 


Plot the data in the format of Figure 26.3. Does it appear that any factor effects 
present? че 


. Obtain the analysis of variance table. 


d. Test whether or not the mean outputs differ for the three machine models; use a= 0l 


. Test whether or not the mean outputs differ for the operators assigned to each m 


State the alternatives, decision tule, and conclusion. What is the P-value of the test? 

: р : achine; 
use о = .OL. State the alternatives, decision гше, and conclusion. What is the P-value of 
the test? What does your conclusion imply about the mean outputs for the four Operators 
assigned to machine 3? Explain. 


. Test for each machine separately whether or not the mean outputs for the four Operators 


differ. For each test. use о = .01 and state the alternatives. decision rule, and conclusion. 


. What is the family level of significance for the combined tests in parts (d), (е), and (f) 


using the Bonferroni inequality? Summarize the set of conclusions reached in your tests 


Refer to Bottling plant production Problems 26.4 and 26.5. 


a. 


Make all pairwise comparisons among the mean outputs for the three machines, Use the 
Tukey procedure with a 95 percent family confidence coefficient. State your findings, 
Make all pairwise comparisons among the mean outputs for the four operators assigned to 
machine t. Use the Bonferroni procedure with a 95 percent family confidence coefficient, 
State your findings. 

Operator 4 assigned to machine 1 has relatively little experience compared to the other 
three operators. Estimate the contrast: 


ип c pac gua 
3 


L 


Ba 


using a 99 percent confidence interval. Interpret your interval estimate. 


Refer to Bottling plant production Problem 26.4. Assutne that the four operators assigned 
to each machine were selected at tandom from a large number of operators. 


а. 
b. 
с. 


How is nested design model (26.7) modified to fit this case? 

Obtain a point estimate of the operator variance оў. 

Test whether or not Op equals zero; use œ = .10. State the alternatives, decision rule, and 
conclusion. What is the P-value of the test? 

Use the MLS procedure to obtain an approximate 90 percent confidence interval for ср. 
Interpret your confidence interval. 

Test whether or not the mean outputs differ for the three machine models; use @ = 10, 
State the alternatives, decision rule, and conclusion. What is the P-value of the test? 
Make all pairwise comparisons among the mean outputs for the three machines. Use the 


Tukey procedure with a 90 percent family confidence coefficient. State your findings. 


И 2 
Test the assumption that the д; for all machines have the same variance og. Use the 


Brown-Forsythe test (Section 18.2) with a = .01. State the alternatives. decision rule, and 
conclusion. 


26.8. 


*26.9. 


*26.10. 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1121 


Refer to Bottling plant production Problem 26.4. Assume that the four operators assigned 

to each machine were selected at random from a large number of operators and that the three 

machines were chosen at random from a large number of machines. 

a. How is nested design model (26.7) modified to fit this case? 

b. Obtain point estimates of the operator and machine variances of and c7, respectively. 

c. Test whether or not o2 equals zero; use o = .05. State the alternatives, decision rule, and 
conclusion. What is the P-value of the test? 

d. Use the MLS procedure to obtain an approximate 95 percent confidence interval for оў. 
Interpret your confidence interval. 

e. The production engineer is interested in estimating the overall mean u.. with a 95 percent 
confidence interval. Obtain the desired confidence interval and interpret your interval 
estimate. 


Bealth awareness. Three states (factor A) participated in a health awareness study. Each state 
independently devised a health awareness program. Three cities (factor B) within each state 
were selected for participation and five households within each city were randomly selected 
to evaluate the effectiveness of the program. All members of the selected households were 
interviewed before and after participation in the program and a composite index was formed 
for each household measuring the impact of the health awareness program. The data on health 
awareness follow (the larger the index, the greater the awareness). 


Household k = 1: 26 234 19 18 16 
К = 2: 56 38 51 36 40 28 
k=3: 35 42 60 24 27 45 
k=4: 40 35 29 12 31 30 
k=5: 28 53 44 33 23 2 


а. Obtain the residuals for nested design model (26.7) with fixed factor effects and plot them 
against the fitted values. Also prepare a normal probability plot of the residuals. What are 
your findings about the appropriateness of model (26.7)? 

b. Prepare aligned residual dot plots by state. Do these plots support the assumption of 
constancy of the error variance? Discuss. 

c. Plot the data in the format of Figure 26.3. Does it appear tbat any factor effects are 
present? 

Refer to Health awareness Problem 26.9. Assume that nested design model (26.7) with fixed 

factor effects is appropriate. 

а. Obtain the analysis of variance table. 

b. Test whether or not the mean awareness differs for the three states; use œ = .05. State the 
alternatives, decision rule, and conclusion. What is the P-value of the test? 

c. Test whether or not the mean awareness differs for the three cities within each state; use 
о — .05. State the alternatives, decision rule, and conclusion. What is the P-value of the 
test? What does your conclusion imply about the awareness means for the three cities in 
state 1? Explain. 

d. What is the family level of significance for the combined tests in parts (b) and (c) using 
the Bonferroni inequality? Summarize the set of conclusions reached in your tests. 


1122 Part Six Specialized Study Designs 


*26.11. Refer to Health awareness Problem 26.9 and 26.10. 


а. 


b. 


Estimate дүү with a 95 percent confidence interval. Interpret your interval estimate, 
Obtain separate confidence intervals for 41., э... and y3.. cach with a 99 percent confidence 
coefficient. Interpret your interval estimates. 

Obtain confidence intervals for all pairwise comparisons among the state means. Use the 
Tukey procedure and а 90 percent family confidence coefficient. Summarize your findings. 
lt is desired to obtain a 95 percent confidence interval for L = у — H32, since these two 
cities are of comparable síze. Interpret your interval estimate. 


*26.12. Refer to Health awareness Problem 26.9. Assume that the three cities in each state Were 
chosen at random from all the cities in the state. 


*26.13. 


26.14. 


а. 


b. 


How is nested design model (26.7) modified to fit this case? 

. H + ul + = Е] + + 
Obtain a point estimate of the city variance oj. ls there anything peculiar about the estimate 
here? 
Test whether or not оў equals zero; use œ = . 10. State the alternatives, decision rule, and 
conclusion. What is the P-value of the test? 
Test whether or not the mean awareness differs for the three states; use œ = ‚10, State бе 
alternatives, decision rule, and conclusion. What is the P-value of the test? 
Obtain confidence intervals for all pairwise comparisons between the state means. Use the 
Tukey procedure and a 90 percent family confidence coefficient. Summarize your findings, 
Test the assumption that the Sju, for all states have the same variance ор. Use the Hartley 
rest (Section 18.2) with significance level о = .05. State the alternatives, decision rule, 
and conclusion. 


Refer to Health awareness Problem 26.9. Assume that the three cities within each state and 
the three states were selected at random. 


a. 
b. 
c. 


How is nested design model (26.7) modified to fit this case? 

Obtain point estimates of the city and state variatices о and o7, respectively. 

Test whether or not o; equals zero: use о = .01. State the alternatives, decision rule, and 
conclusion. What is the P-value of the test? 

Use the MLS procedure to obtain an approximate 99 percent confidence interval for o2. 
Interpret your confidence interval. 

Estimate the overall mean health awareness index u.. using a 99 percent confidence interval. 
Interpret your interval estimate. 


Internal control. A large retailer operates three regional accounting centers (factor A). Cen- 
ter | employs three audit teams, while the other two centers employ two audit teams each. 
One function of each center is to review whether a certain internal control operates properly 
in the processing of payroll. Data on the percent of transactions where the internal control was 
found to be operating properly were requested for each team in each region for the previous 
two months. Three months" data were received in one case, and data for only one month in 
another. The arcsine transformation Y' = 2 arcsin \/p was employed to stabilize the error 
variances. The transformed data follow. 


Region i: 1 | 2 | 3 
Team j: 1 2 3 1 2 | 1 2 
Month К = 1: 151.6 1432 1314 | 1638 151.6 157.0 160.0 
к=2: 141.2 139.4 136.0 154.2 147.2 151.6 


к= 3: 149.4 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1123 


a. Set up the full regression model for this case, analogous to the illustrative full model 
(26.31), using 1, —1, O indicator variables. 

b. Fit this model and obtain the residuals. Plot the residuals against the fitted values. Also 
prepare a normal probability plot of the residuals. What are your findings about the appro- 
priateness of the model? 

Refer to Internal control Problem 26.14. Assume that nested design model (26.7) with fixed 

factor effects, modified for unequal nestings and replications, is appropriate. 


a. Test for region main effects using test statistic (7.27) and significance level о = .025. State 
the alternatives, reduced model, decision rule, and conclusion. What is the P-value of the 
test? 


b. Test for effects of audit teams within region using test statistic (7.27) and significance level 
о = .025. State the alternatives, reduced model, decision rule, and conclusion. 


c. Estimate L = ш. — ио. (in transformed units) with a 98 percent confidence interval. 


A student asked in class why all experiments do not make use of repeated observations since 
all measurement procedures are inexact to some degree. Comment. 


Refer to Questionnaire color Problem 16.8. Suppose that the experiment was conducted 
by distributing the fliers to the assigned parking lots in two different weeks and noting the 
response rates for each week. The complete data on response rates follow. 


Color i: 1 (Blue) 2 (Green) 3 (Orange) 
Lot j: 1 2 3 4 5 


Week К = 1: | 28 26 31 27 35 | 34 29 25 31 29 | 31 25 27 29 28 
k=2: | 32 23 29 24 37 | 33 27 22 34 25 | 35 28 25 25 31 


a. Obtain the residuals for subsampling model (26.35) with fixed treatment effects and plot 
them against the fitted values. Also prepare a normal probability plot of the residuals. What 
are your findings about the appropriateness of model (26.35)? 

b. Test the assumption that the є у have the same variance c? for all colors. Use the Brown- 
Forsythe test (Section 18.2) with significance level o = .01. State the alternatives, decision 
rule, and conclusion. 


Refer to Questionnaire color Problem 26.17. Assume that subsampling model (26.35) with 

fixed treatment effects 15 appropriate. 

а. Obtain the analysis of variance table. 

b. Test whether or not questionnaire color effects are present; use о = .05. State the alterna- 
tives, decision rule, and conclusion. What is the P-value of the test? 

c. Test whether or not lot differences within colors are present; use œ = .05. State the alter- 
natives, decision rule, and conclusion. What is the P-value of the test? 

d. Estimate the mean response rate for blue questionnaires with a 95 percent confidence 
interval. 

e. Obtain point estimates of c? and o7. Which variance appears to be larger here? 

f. Use the MLS procedure to obtain an approximate 95 percent confidence interval for o?. 
Also obtain a 95 percent confidence interval for оў. Interpret your interval estimates. 

Plant acid levels. Four plants of the same variety were randomly selected in an experiment 

to investigate the concentration of a particular acid. Three leaves per plant were randomly 

selected and three separate determinations of the acid concentration were obtained per leaf. 


лала наас CARA GEH NS ATE DA Re EIER Bard pa art e АЛТ. EAR 27 TUE адыл PAN VARIANT DTE ARAA LL SY NIMES EN od at Lr dt tt n aeo бале Ца arb amarus a 


1124 Part Six Specialized Study Designs 


*26.20. 


26.21. 


The data follow. 


Plant i: 1 2 | 3 

Leaf j: 1 2 3 1 3 | 1 2 3 1 3 
Determination 

k=1: 11.2 165 183 | 14.1 19.0 11.9 | 15.3 19.5 165 | 73 8.9 113 
К = 2: 11.6 168 187 | 13.8 18.5 12.4 | 15.9 20.1 172 | 78 94 109 
k=3: 12.0 16.1 19.0 | 14.2 18.2 12.0 | 16.0 19.3 16.9 | 7.9 9.3 10.5 


Obtain the residuals for three-stage subsampling model (26.43) and plot them against the 
TH ч H о 

fitted values. Also prepare a normal probability plot of the residuals. What are your findings 

about the appropriateness of model (26.43)? 

Refer to Plant acid levels Problem 26. 19. Assume that three-stage subsampling model (26.43) 

is appropriate. 

à. Obtain the analysis of variance table. 

b. Test whether or not there are variations in mean concentration levels between plants; use 
a = .05. State the alternatives, decision rule, and conclusion. What is the P-value of the 
test? 

c. Test whether or not there are variations in mean concentration levels between leaves of the 
same plant; use œ = .05. State the alternatives, decision rule, and conclusion. What is the 
P-value of the test? 

d. Estimate the overall mean concentration in all plants of the variety; use a 95 percent 
confidence interval. 

e. Obtain point estimates of o7. o°, and оу. Which component of variance appears to be most 
important in the total variance сӯ? 

f. Use the MLS procedure to obtain an approximate 90 percent confidence interval for c2, 
Does the experiment provide a precise estimate of this variance component? 

Chemical consistency. A chemical company wished to study the consistency of the strength 

of one of its liquid chemical products. The product is made in batches in large vats and then is 

barreled. The barrels are subsequently stored for a period of time in a warehouse. To examine 
the consistency of the strength of the chemical. an analyst randomly selected five different 
batches of the product from the warehouse and then selected four barrels per batch at random. 

Three determinations per barrel were made. The data on strength follow. 


Batch i: 1 2 p] 5 
Barrel j: 1 2 3 4 1 


Determination 


К=1: 2.3 2.5 2.6 2.4 | 2.8 27 2.6 2.4 3.6 3.8 37 39 
К= 2: 2.1 2.3 24 2.6 | 2.9 2.5 2.6 2.8 3.7 3.8 35 35 
к= 3: 2.0 2.5 2.7 2.3 | 2.6 2.8 2.8 2.6 3.4 3.5 3.5 3.7 


a. Obtain the residuals for three-stage subsampling model (26.43) and plot them against the 
fitted values, Also prepare a normal probability plot of the residuals. What are your findings 
about the appropriateness of model (26.43)? 

b. Test the assumption that the ғ гу have the same variance o? for all batches. Use the Hartley 
test (Section 18.2) with significance level о = .01. State the alternatives, decision rule, and 
conclusion. 


т 


Exercises 


AL ten 


26.22. 


Chapter 26 Nested Designs, Subsampling, and Partially Nested Designs 1125 


Refer to Chemical consistency Problem 26.21. Assume that three-stage subsampling model 

(26.43) is appropriate. 

а. Obtain the analysis of variance table. 

b. Test whether or not there are variations in mean strength between batches; use a = .01. 
State the alternatives, decision rule, and conclusion. What is the P-value of the test? 

c. Test whether or not there are variations in mean strength between barrels within batches; 
use œ = .01. State the alternatives, decision rule, and conclusion. What is the P-value of 
the test? 


d. Estimate the overall mean strength of the chemical using a 99 percent confidence interval. 


e. Obtain point estimates of o2, o?, and о] 


important in the total variance 02? 


. Which component of variance appears to be most 


f. Use the MLS procedure to obtain an approximate 95 percent confidence interval for o2. 
Does the experiment provide a precise estimate of this variance component? 


26.23. 
26.24. 
26.25. 


26.26. 


26.27. 


26.28. 


Derive (26.13) by squaring (26.12) and summing over all observations. 

Derive (26.16) for a balanced nested two-factor design. 

Consider a balanced nested two-factor design with factor A having fixed effects and factor B 
(nested within factor А) having random effects. 

a. Derive o?(Y,..) and o?{¥..}. 

b. Find an unbiased point estimator of o7. 


Show that o?{¥;..} = (02 + mo?)/nm for subsampling model (26.35) with fixed treatment 
effects. 

Derive variance (26.45) for three-stage subsampling model (26.43). Using the expected mean 
squares in Table 26.8, show that the estimated variance (26.46) is an unbiased estimator of 
variance (26.45). 

Use (26.52) and the fact that this estimated variance is unbiased to find сҮ gels Y.] for 
ANOVA model (26.48). What is the approximate number of degrees of freedom associated 
with the estimated variance? 


" "Projects 


26.29. 


26.30. 


Refer to the Drug effect experiment data set in Appendix C.12. Consider only Part I of the 
study and dosage level 4; i.e., include only observations for which variable 2 equals 1 and 
variable 5 equals 4. Assume that initial lever press rate (factor A) has fixed effects and that 
rats are a second factor (factor D) with random effects. 


a. State the appropriate model for this nested two-factor study. 


b. Obtain the residuals and plot them against the fitted values. Also prepare a normal prob- 
ability plot of the residuals. What are your findings about the appropriateness of your 
model? 


Refer to the Drug effect experiment data set in Appendix C.12 and Project 26.29. Assume 
that nested design model (26.7), with B5 and єз random, is appropriate. 
a. Obtain the analysis of variance table. 


b. Test whether or not the mean lever press rate differs for the three initial rate groups; use 
a = .05. State the alternatives, decision rule, and conclusion. What is the P-value of the 
test? 


1126 Part Six Specialized Study Designs 


C. 


. Obtain an approximate 90 percent confidence interval for the between-rats vari 


Test whether or not the mean lever press rate differs for the rats within the 
groups; use œ = .05. State the alternatives, decision rule, and conclusion, 
P-value of the test? What does your conclusion imply about the four rats in 
rate eroup? 


initia] rate 
What is the 
the slow initial 


Make all pairwise comparisons between the mean lever press rates for the three ini 


ü 
groups, Use the Tukey procedure with a 90 percent family confidence coefficient чл 


n & ance, Using 
the MLS procedure, Interpret your interval estimate. 


26.31. Refer to the Drug effect experiment data set in Appendix C.12. Consider Only Part II of 

the study and dosage level 3; i.e., include only observations for which variable 2 equals 2 
and variable 5 equals 3. Assume that the initial lever press rate groups are the treatments 
with fixed effects, and that the rats are the experimental units with two observations for each 
experimental unit. 


26.32. 


a. State the appropriate model for this single-factor study with subsampling. 


b. Obtain the residuals and plot them against the fitted values. Also prepare a normal prob- 


ability plot of the residuals, What are your findings about the appropriateness of your 
model? 

Test the assumption that the £;;, have the same variance o? for all lever press rates. Use 
the Brown-Forsythe test (Section 18.2) with œ = .01. State the alternatives, decision rule, 
and conclusion. 


Refer to the Drug effect experiment data set in Appendix C.12 and Project 26.31. Assume 
that single-factor subsampling model (26.35) with fixed treatment effects is appropriate. 


a. 
b. 


Obtain the analysis of variance table. 


Test whether or not the mean lever press rate differs for the three initial rate groups; use 
a = .OL. State the alternatives, decision rule, and conclusion. What is the P-value of the 
test? 


. Test whether or not differences in the mean lever press rate between rats are present; use 


a = .OL. State the alternatives, decision rule, and conclusion. What is the P-value of the 
test? 


. Make all pairwise comparisons between the mean lever press rates for the three initial 


rate groups. Use the Tukey procedure with a 95 percent family confidence coefficient. 


Summarize your findings. 
Obtain interval estimates for о? and o;, with confidence coefficient .90 for each. Interpret 


your confidence intervals. Which variance component appears to be larger? 


Chapter 


hepeated Measures 


and Related Designs 


In this chapter we take up repeated measures designs—designs that are widely used in the 
behavioral and life sciences. We begin by considering some basic elements of repeated 
measures designs. We then take up single-factor repeated measures designs, after which 
we consider two-factor experiments with repeated measures on both one factor and on two 
factors. We conclude this chapter with an introduction to split-plot designs, which include 
two-factor repeated measures designs with repeated measures on one factor. 


21.1 Elements of Repeated Measures Designs 


Description of Designs 
Repeated measures designs utilize the same subject (person, store, plant, test market, etc.) 
for each of the treatments under study. The subject therefore serves as a block, and the 
experimental units within a block may be viewed as the different occasions when a treatment 
is applied to the subject. А repeated measures study may involve several treatments or only 
a single treatment that is evaluated at different points in time. Subjects used in repeated 
measures studies in the behavioral and life sciences include persons, households, observers, 
and experimental animals. At other times the subjects in repeated measures designs are 
stores, test markets, cities, and plants. We shall refer to all of these study units used in 
repeated measures designs as subjects. 
Three examples of repeated measures designs follow. 


1. Fifteen test markets are to be used to study each of two different advertising campaigns. 
In each test market, the order of the two campaigns will be randomized, with a sufficient 
time lapse between the two campaigns so that the effects of the initial campaign will not 
carry over into the second campaign. The subjects in this study are the test markets. 

2. Twohundred persons who have persistent migraine headaches are each to be given two 
different drugs and a placebo, for two weeks each, with the order of the drugs randomized 
for each person. The subjects in the study are the persons with migraine headaches. 

3. Ina weight loss study, 100 overweight persons are to be given the same diet and their 
weights measured at the end of each week for 12 weeks to assess the weight loss over 


1127 


1128 PartSix Specialized Study Designs 


time. Here the subjects аге the overweight persons, who are observed repeateg] 


Mi g { па : y to provi 
information about the etfects of a single treatment over time. Provide 


Each of these studies involves a repeated meusures design because the same s 
is measured repeatedly. This key characteristic distinguishes this type of design 
designs considered earlier. 


ubject 
from the 


Advantages and Disadvantages 


A principal advantage of repeated measures designs is that they provide good precision 
for comparing treatments because all sources of variability between subjects are excluded 
from the experimental error. Only variation within subjects enters the experimental error, 
since any two treatments can be compared directly for each subject. Thus, one may view the 
subjects as serving as their own controls. Another advantage of a repeated measures design 
is that it economizes on subjects. This is particularly important when only a few subjects 
(e.g., stores, plants, test markets) can be utilized for the experiment. Also, when interest isin 
the effects of a treatment over time, as when the shape of the learning curve for a new process 
operation is to be studied, it is usually desirable to observe the same subject at different 
points in time rather than observing different subjects at the specified points in time. 

Repeated measures designs have a serious potential disadvantage, however, namely, that 
there may be several types of interference. One type of interference is an order effect, whichis 
connected with the position in the treatment order. For instance, in evaluating five different 
advertisements, subjects may tend to give higher (or lower) ratings for advertisements 
shown toward the end of the sequence than at the beginning. Another type of interference 
is connected with the preceding treatment or treatments. For instance, in evaluating five 
different soup recipes, a bland recipe may get a higher (or lower) rating when preceded by 
a highly spiced recipe than when preceded by a blander recipe. 'This type of interference is 
called a carryover effect. 

Various steps can be taken to minimize the danger of interference effects. Randomization 
of the treatment orders for each subject independently will make it more reasonable to 
analyze the data as if the error terms are independent. Allowing sufficient time between 
treatments is often an effective means of reducing carryover effects. It may be desirable at 
times to balance the order of treatment presentations and sometimes even the number of 
times each treatment is preceded by any other treatment. Latin square designs and crossover 
designs (discussed in Chapter 28) are helpful to this end. 


How to Randomize 


The randomization of the order of the treatments assigned to a subject is straightforward. For 
each subject, a random permutation is used to define the treatment order, and indepéndent 
permutations are selected for the different subjects. 


Comment 


Designs with repeated measures, discussed here, need to be distinguished from designs with repeated 
observations, discussed in Section 26.7. In repeated measures designs. several or all of the treatments 
are applied to the same subject. Designs with repeated observations, on the other hand, are designs 
where several observations on the response variable are made for a given treatment applied to an 
experimental unit. It is possible to develop a repeated measures design with repeated observations, as 
when a given subject is exposed to each of the treatments under study and a number of observations 
are made at the end of each treatment application. и 


Chapter 27 Repeated Measures and Related Designs 1129 


97.2 Single-Factor Experiments with Repeated Measures 
on АП Treatments 


Model 


FIGURE 27.1 
Layout for 
Single-Factor 
Repeated 
Measures 
Design 


(s = 5, r = 4). 


We first consider repeated measures designs where the treatments are based on a single 
factor, as in the examples in Section 27.1. Almost always, the subjects 1n repeated measures 
designs (persons, stores, test markets, experimental animals) are viewed as a random sample 
from a population. Hence, in all of the models for repeated measures designs to be presented 
in this chapter, the effects of subjects will be viewed as random. 

Figure 27.1 contains the layout for a single-factor experiment with repeated measures on 
all treatments. Here, there are five subjects and four treatments, with the order of treatments 
independently randomized for each subject. Notice that this layout corresponds to the one 
in Figure 21.1 for a randomized complete block design. Indeed, as we shall see next, the 
models for single-factor repeated measures designs are formally the same as the ones for 
randomized block designs, with blocks now considered to be subjects. 


When treatment effects are fixed, a model often appropriate for a single-factor repeated 
measures design is the following additive model: 


Ү = и. + pit tj t еу (27.1) 
where: 
р. 15 a constant 
p; are independent N (0, o2) 
ту are constants subject to? уту = 0 
є are independent N (0, o?) 
pi and &;; are independent 


D-1,...;$9; jm dr 


Treatment Order 
1 2 3 4 


Subject 1 T, I Т, Т, 


1130 Part Six Specialized Study Designs 


Note that repeated measures model (27.1) is identical to randomized block model (25.6 
with random block effects, except that np = s. 67) 
Hence. we know from Section 25.5 that repeated measures model (27.1) assum 


; 3 es th 
following about the observations Y;;: * 


E(Yi) = n + Tj (27.2a) 
ӨҢҮ, IE оў = о + с? (27.2) 
OY. Үр) = ор = оор JAJ (27.20) 
o {Yi Yr] = 0 izi (27.24) 


where w is the coefficient of correlation between any two observations for the same subject: 


w = 


(27.2e) 


less 


Thus, repeated measures model (27.1) assumes that in advance of the random trials, any 
two treatment observations Y;; and Y;; for a given subject are correlated in the same fashion 
for all subjects. This key assumption implies, as we saw in (25.71), that the variance- 
covariance matrix of the observations Y;; for any given subject has compound symmetry. 
Any two observations from different subjects in advance of the random trials are independent 
according to model (27.1). 

Equally important, we know from Chapter 25 that repeated measures model (27.1) 
assumes that, once the subjects have been selected, any two observations for a given subject 
are independent. Thus. model (27.1) assumes that there are no interference effects in the 
repeated measures study, such as order effects or carryover effects from one treatment to 
the next. 


Comment 
If interaction effects between subjects and treatments are present, interaction model (25.74) can be 


employed. As we noted in Chapter 25, both the additive and interaction models lead to the same 
procedures for making inferences about the treatment effects. a 


Analysis of Variance and Tests 


Since repeated measures model (27.1) is the same as randomized complete block model 
(25.67), the analysis of variance and the test for treatment effects will be the same gs before. 


Analysis of Variance. The ANOVA sums of squares for repeated measures model (27.1) 
are the same as in (21.6). but the names of two of the sums of squares are usually changed 
for repeated measures applications. The sum of squares for blocks in (21.6a) will now 
be called the зит of squares for subjects, and the interaction sum of squares between 
blocks and treatments in (21.6c) will now be called the interaction sum of squares between 
treatments and subjects. These two sums of squares will be denoted, respectively, by SSS and 
SSTR.S. Thus, the analysis of variance decomposition for single-factor repeated measures 
model (27.1) is: 


SSTO = SSS + SSTR + SSTR.S (27.3) 


Chapter 27 Repeated Measures and Related Designs 1131 


TABLE 27.1 ANOVA Table for Single-Factor Repeated Measures Design—ANOVA 
Model (27.1) with Subject Effects Random and Treatment Effects Fixed. 


Source of 
Variation SS df MS E{MS} 
Subjects SSS 5—1 MSS c? + ro? 
т? 
Treatments SSTR r —1 MSTR c? + s> = 
Error SSTR.S (г — 1)(5 — 1) MSTR.S c? 
Total SSTO sr —1 
where: 
SSTO = 3 v; - x? (27.3a) 
ЖЕТ 
SSS =r у (X. Е)? (27.3b) 
SSTR = з У (Y; - X.? (27.30) 
1 
SSTRS = у) (X; - Y. - Y, - Y. (27.3d) 
i j 


Note that no error sum of squares is present because there are no replications here. 

Table 27.1 contains the analysis of variance table for repeated measures model (27.1). It 
is the same as the ANOVA table in Table 25.8 for additive randomized block model (25.67), 
except for the change in notation. Note again that in the absence of interactions between 
treatments and subjects, the interaction mean square MSTR.S 1s an unbiased estimator of 


the error variance o?. 


Comment 


In repeated measures studies, SSTR and SSTR.S are sometimes combined into a within-subjects sum 
of squares SSW: 


SSW = SSTR + SSTR.S (27.4) 
which can be shown to equal: 
ssw = Y^, - Xy (27.4) 
i Jj 
Hence, the ANOVA decomposition in (27.3) can also be expressed as follows: 
SSTO= SSS + SSW (27.5) 
м — 
Between- Within- 
subjects subjects 


variability variability E 


1132 Part Six Specialized Study Designs 


Example 


TABLE 27.2 
Data—Wine- 
Judging 
Example 
(ratings on a 


scale of 0 to 40). 


Test for Treatment Effects. Asthe E(MS] column in Table 27.1 indicates, the a 


RAS Ppropria 
statistic for the test on treatment effects: priate 


Ho: all т; = 0 
Ha: not all т; equal zero (27.6a) 
is: 
+ . MSTR 
~ MSTR.S (27.6b) 


When Ho holds, F* follows the F distribution, and the decision rule for controlling the 
Type I error at о is: 


If F* < ЕП —o;r — 1, (r — 1)(s — 1)], conclude Hy 


If F* > Е — &;r — 1, (r — Ds — 1)], conclude H, (27.60) 
In а wine-judging competition, four Chardonnay wines of the same vintage were judged 
by six experienced judges. Each judge tasted the wines in a blind fashion, i.e., without 
knowing their identities. The order of the wine presentation was randomized independently 
for each judge. To reduce carryover and other interference effects, the judges did not drink 
the wines and rinsed their mouths thoroughly between tastings. Each wine was scored on 
a 40-point scale; the higher the score, the greater is the excellence of the wine. The data 
for this competition are presented in Table 27.2. A plot of the wine scores for each judge 
is shown in Figure 27.2. We see that there are some distinct differences in ratings between 
judges but that the ratings for wines 3 and 4 are consistently best and for wine | generally 
worst. We also see that the rating curves for the judges do not appear to exhibit substantial 
departures from being parallel. Hence, an additive model appears to be appropriate. 

The six judges are considered to be a random sample from the population of possible 
judges, while the four wines tasted are of interest in themselves. Hence, single-factor re- 
peated measures model (27.1) was expected to be appropriate, with the effects of subjects 
(judges) considered random and the effects of treatments (wines) considered fixed. As 


Judge zr SAM. Lc len ` 
i 1 2 3 4 Y. 
1 20 24 28 28 25 
2 15 18 23 24 20 
3 18 19 24 23 21 
4 26 26 30 30 28 
5 22 24 28 26 25 
6 19 21 27 25 23 


20.00 22.00 26.67 26.00 22.67 — Y. 


us 


‘FIGURE 27.3 
3MINITAB 
:ANOVA Table 
‘for Single- 
Factor 


Chapter 27 Repeated Measures and Related Designs 1133 
Yy 
Judge 

30 4 

1 

5 
25 6 

2 

3 
20 
15 

0 
1 2 3 4 
Wine 

Factor Type Levels Values 
Judge random 6 1 2 3 4 5 6 
Wine fixed 4 1 2 3 4 
Analysis of Variance for Rating 
Source DF $$ М$ F P 
Judge 5 173.333 34.667 32.50 0.000 
Wine 3 184.000 61.333 57.50 0.000 
Error 15 16.000 1.067 
Total 23 373.333 


we shall see later, additional diagnostic analysis supports the appropriateness of ANOVA 
model (27.1). 
Figure 27.3 contains MINITAB ANOVA output for the wine-judging data in Table 27.2. 
To test for treatment effects: 
Не: тү = 2 = 13 = тл = 0 
На: not all т; equal zero 


we use the results of Table 27.3: 
MSTR 61.333 
= = = 57.5 
MSTR.S 1.067 
For level of significance œ = .01, we require F(.99;3, 15) = 5.42. Since F* = 57.5 > 


5.42, we conclude H,, that the mean wine ratings for the four wines differ. The P-value for 
this test is 0+. 


ж 


1134 PartSix Specialized Study Designs 


TABLE 27.3 Estimated Within-Subjects 
Variance-Covariance Matrix between Treatment 
Observations—Wine-Judging Example. 


HA 


J 
1 2 3 4 
1 [14.000 11.000 9.200 8.200 
| 2 10.000 8.200 7.600 
J 3 7.067 6.200 
4 6.800 
Comments 


і. As we noted in Chapter 25 (in Comment 2 on p. 1065), a conservative test for treatment effects 
should be used if the assumptions of compound symmetry in repeated measures model (27.1) are not 
met (i.e., if either the variances of the observations for different treatments for a given subject are not 
the same for all subjects or if the correlations between any two treatment observations for a given 
subject are not the same for all treatment pairs and for all subjects). In repeated measures studies, the 
compound symmetry assumption will be violated, for instance, if repeated responses over time are 
more highly correlated for observations closer together than for observations further apart in time. 

2. When the treatment effects are random, test statistic (27.6b) and decision rule (27.6c) are still 
appropriate for testing treatment effects. 

3. The efficiency of the repeated measures design in the wine-judging example, relative to a 
completely randomized design where each judge is used to assess a single wine, can be measured by 
means of (21.14). Using the results in Figure 27.3 with nj = s, we obtain: 


ics (s— DMSS + s(c — DMSTR.S 5(34.667) + 6(3) (I. 067) _ 
(sr — МТК. i 23(1.067) 
Thus. almost eight times as many replications per treatment would have been required with a com- 


pletely randomized design in which each judge rates a single wine as in the repeated measures design 
to achieve the same precision for any estimated contrast. 


4. When a single-factor repeated measures design involves г = 2 treatments, the F* statistic In 
(27.6b) is equivalent to the two-sided ¢ test for paired observations based on test statistic (A.69). 


5. Occasionally. a formal test for subject effects is desired: 
=0 
> 0 


Ho: о, 
Н: 


S. S. 


Table 27.1 indicates that the appropriate test statistic for repeated measures model (27.1) is P 
MSS/ MSTR.S. a 


Evaluation of Appropriateness of Repeated Measures Model 
Since repeated measures model (27.1) is equivalent to randomized block model (25.67), the 
earlier discussion on diagnostics for randomized block models is entirely applicable here. 
In particular, a plot of the responses Y;; by subject, as in Figure 27.2, can be examined for 
indications of serious lack of РТА which would suggest that additive model (27. D 
may not be appropriate. 


Example 


Chapter 27 Repeated Measures and Related Designs 1135 


Residual sequence plots by subject can be helpful for studying constancy of the error 
variance and presence of interference effects. The residuals for repeated measures models 
(27.1) are the same as in (21.5): 


€jj = Y; = Y. = Y; + Y. (27.7) 


A normal probability plot of the estimated residuals in (27.7) can be helpful for evaluating 
whether the residuals are normally distributed. 

In addition to these graphic diagnostics, the estimated within-subjects variance- 
covariance and correlation matrices for the treatment observations Y;; can be examined for 
appropriateness of the repeated measures model. A typical entry in the variance-covariance 
matrix is the estimated within-subjects covariance between observations for treatments j 
and j^: 


iiy — EDY — Үр) 
s—1 


(27.8) 


The estimated within-subjects variance-covariance matrix should show variances of the 
same order of magnitude, and all of the covariances should be of similar magnitude. Of 
course, estimated variances and covariances tend to be subject to large sampling errors unless 
the sample sizes are very large. Hence, moderate differences in variances and covariances 
should be viewed as likely to be the result of sampling errors. 

The estimated correlation matrix should show approximately similar coefficients of 
correlation between pairs of treatment observations within a subject. 

Finally, the Tukey test described in Section 20.2 can be conducted to examine the ap- 
propriateness of the additive model. This test will need to be interpreted here as conditional 
on the subjects actually used in the repeated measures study. 


For the wine-judging example, the residuals were obtained from (27.7), and are presented in 
Figure 27.4a in SAS/GRAPH aligned dot plots by wine. These plots support the assumption 
of constant error variance. Figure 27.4b presents residual sequence plots for each judge, 
where the residuals are plotted in the order in which the wines were tasted by the judge. 
These plots do not indicate any correlations of the error terms within a judge, and thus 
suggest that no interference effects are present. Finally, a normal probability plot of the 
residuals is presented in Figure 27.4c. This plot shows evidence of the effects of the rounded 
nature of the data, but does not suggest any major departure from normality. The correlation 
between the ordered residuals and their expected values under normality is .993, which also 
suggests that lack of normality is not a problem here. 

Table 27.3 presents the estimated within-subjects variance-covariance matrix for the 
treatment observations. The differences found there could easily arise from sampling errors. 

As we noted earlier, the plot of the responses by subject in Figure 27.2 also supports 
the appropriateness of model (27.1), since the plots for the judges are reasonably parallel. 
Thus, there is no indication of interactions between subjects and treatments. 

On the basis of these and other diagnostics, it was concluded that repeated measures 
model (27.1) is reasonably appropriate for the data in the wine-judging example. 


1136 PartSix Specialized Study Designs 


FIGURE 27.4 SAS/GRAPH Diagnostic Residual Plots—Wine-Judging Example. 


(a) Residual Dot Plots (c) Normal Probability Plot 
e 3 e e 
к ++————+———————{ Wine 4 
3 = 
. Н . E 
E———À—À9——-———MÀ-——1L wine 3 i 
EA 
$ 
$ $ a2 
E——À—M9——-———MÀ-—-- Wine 2 
$ LÀ $ LÀ 
E—3-—— 9 À——À —HÀJÀ——37A wine 1 
| -2 B 0 1 2 -20-15-1.0-0.5 0.0 0.5 10 1 
| Expected Value 
| (b) Residual Sequence Plots 
4 Judge 1 Judge 2 Judge 3 
$ Hi 
1 1 1 1 
i = ° e = as 
s S S S 
: i E $ 
MES а 0 * 0 р 0 
- E E . E e o 
M -1 -1 -1 
e. LÀ 
Т | —2 —2 -2 
Judge 4 Judge 5 Judge 6 
n 2 2 
Ы 1 1 e 
€ a. e Ф an) 
У ч З 
> > 2 
| 5 зо $ o 
д E E B pes 6 
T -1 -1 
$ e 
К -2 -2 
| 1 2 3 4 1 2 3 4 1 2 3 
t Time Order Time Order Time Order 
i 


Chapter 27 Repeated Measures and Related Designs 1137 


Analysis of Treatment Effects 
The analysis of treatment effects for single-factor repeated measures model (27.1) proceeds 
in exactly the same fashion as described in Section 21.5 for randomized block designs 
with fixed treatment effects. The multiples in (21.9) for setting up confidence intervals 
are applicable here as they stand. The mean square used in estimating the variance of the 
estimated contrast is still the interaction mean square, which is now denoted by MSTR.S. 
We shall illustrate the estimation procedures by an example. 


In the wine-judging example, it was desired to compare all treatment means џи. ; pairwise, 
with a 95 percent family confidence coefficient. Here 4.; is the mean rating of wine j 
averaged over judges. The Tukey procedure was utilized for this purpose. Using (17.30) 
with MSE replaced by MSTR.S and the estimated pairwise difference denoted by L, we 
obtain using the results in Figure 27.3: 


EET 
Example 
Examp'e — 


" 1 1 2 
541} -usres( + 5) = 10(2) = .3557 
5 


S 
Using (21.9b), we find for a 95 percent family confidence coefficient: 


1 1 
T= om 15) 2 — 


4.08) = 2.885 
Va ae 


Hence: 
Ts(L) = 2.885 3557 = 1.72 

Thus we obtain for the pairwise comparisons (see Table 27.2 for the Y 3): 
—2.39 = (26.00 — 26.67) — 1.72 < ид — из < (26.00 — 26.67) + 1.72 = 1.05 
2.28 = (26.00 — 22.00) — 1.72 < ша — и. < (26.00 — 22.00) + 1.72 = 5.72 
4.28 = (26.00 — 20.00) — 1.72 < u.4 — py < (26.00 — 20.00) + 1.72 = 7.72 
2.95 = (26.67 — 22.00) — 1.72 < из — и.» < (26.67 — 22.00) + 1.72 = 6.39 
4.95 = (26.67 — 20.00) — 1.72 < из — н. < (26.67 — 20.00) + 1.72 = 8.39 
.28 = (22.00 — 20.00) — 1.72 < u.a — u. < (22.00 — 20.00) + 1.72 = 3.72 


We display these results graphically as follows: 


Wine Wine 
4 3 
Wine Wine 
1 2 | / 
20 25 


Taste Score 


We conclude from these pairwise comparisons that wines 3 and 4 are judged best, and do 
not differ significantly from each other. Wines 1 and 2 are judged to be inferior to wines 3 
and 4, with wine 1 receiving a mean rating significantly lower than that for wine 2. The 
family confidence coefficient of .95 applies to the entire set of comparisons. 


1138 Part Six Specialized Study Designs 


TABLE 27.4 
Ranked Data 
for Coffee 
Sweeteners in 

а Repeated 
Measures 
Design—Coffee 
Sweeteners 
Example. 


Ranked Data 


Example 


Sweetener ( j) 


Subject 

i A B c D E 

1 5 1 2 4 3 
2 4 2 1 5 3 
3 3 2 1 4 5 
4 5 2 3 4 1 

5 4 1 2 3 5 
6 4 1 3 5 2 
R.j 4.17 1.50 2.00 4.17 3.17 

Comment 


When the treatments are time order positions, as when process rework is observed for a new manu- 
facturing process at periodic intervals. the nature of the time effect may be analyzed by developing 
an appropriate regression model. н 


In repeated measures studies, the observations are frequently ranks, as when a number of 
tasters are each asked to rank recipes or when several university admissions officers are 
each asked to rank applicants for admission. When the data in a repeated measures study 
are ranks, the nonparametric rank F test described in Comment 3 on page 900 may be used 
for testing whether the treatment means are equal. No new principles are involved, so we 
shall proceed directly to an example. 


Six subjects were each asked to rank five coffee sweeteners according to their taste pref- 
erences, with rank 5 assigned to the most preferred sweetener. The data аге presented in 
Table 27.4 and suggest that a sweetener effect may be present. For example, no judge ranked 
sweetener B higher than 2 (not preferred). 

Test statistic (21.7b) for the ranked data here is: 


po 900 _ 
R 120 


For level of significance a = .05, we need F(.95; 4, 20) = 2.87. Since F$ = 7.5 > 2.87, 
we conclude that the five sweeteners are not equally liked. The P-value of the test is .0007. 


7.5 


Multiple Pairwise Testing Procedure 


Just as in the case of the rank F test for single-factor studies (Section 18.7), we can use 
a large-sample testing analog of the Bonferroni pairwise comparison procedure to obtain 
information about the comparative magnitudes of the treatment means for repeated measures 
designs when the rank F test (or the Friedman test) indicates that the treatment means ‹ differ. 
Testing limits for all g = r(r — 1)/2 pairwise comparisons using the mean ranks А. у are 
set up as follows for family level of significance a: 


| 1/2 
ROG rad (27.9) 
: 


Chapter 27 Repeated Measures and Related Designs 1139 


where: 


В = z(1— 0/22) (27.9а) 
(= 1) 
EE 
If the testing limits include zero, we conclude that the corresponding treatment means p.j 
and u.p do not differ. If the testing limits do not include zero, we conclude that the two 
corresponding treatment means differ. We can then set up groups of treatments whose means 
do not differ according to this simultaneous testing procedure. 


(27.9b) 


We now wish to make all pairwise tests by means of (27.9) with family level of significance 
a = .20 for the coffee sweeteners example. Forr = 5, wehave g = 5(4)/2 = 10and obtain: 


В = z[1 — .20/2(10)] = z(.99) = 2.326 
Thus, the right term in (27.9) for s = 6 and r = Sis: 
r(r + 1) 1/2 ө) 1/2 
B| ——— = 2.326 | —— = 2.12 
| 6s | 6(6) 


We note from Table 27.4 that the pairs of mean ranks whose difference does not exceed 
2.12 are (B, C), (B, E), (C, E), (A, E), (D, E), and (A, D). Hence, we can set up two groups, 
within which the treatment means do not differ: 


Group 1 Group 2 
Sweetener B К., = 1.50 Sweetener E R 5 = 3.17 
Sweetener C К.з = 2.00 Sweetener А R. = 4.17 
Sweetener E R.s = 3.17 Sweetener D R., =4.17 


Thus, we conclude with family level of significance of .20 that sweeteners A and D are 
preferred to sweeteners B and C, and thatit is not clear whether sweetener E belongs in the 
preferred group or in the other group. 


Comments 


1. The rank F test can also be used for repeated measures designs where the observations are not 
ranked, in case the distribution of the error terms departs far from normality. Ranks of the observations 
Y,; are then assigned within each subject, and the rank F test is carried out in the usual manner. 


2. The test statistic FR is related to Kendall’s coefficient of concordance W in the following way: 
(27.10) 


The coefficient of concordance W is a measure of the agreement of the rankings of the s subjects. It 
equals 1 if there is perfect agreement, and equals 0 if there is no agreement, that is, if all treatments 
receive the same mean ranking. For the coffee sweeteners example in Table 27.4, the coefficient of 
concordance W 15: 


This measure indicates that a fair amount of agreement exists between the subjects. ш 


1140 Part Six Specialized Study Designs 


27.3 'Iwo-Factor Experiments with Repeated Measures 


on One Factor 
uo c OM E а сыы сылы ы A EDT 


Description of Design 


FIGURE 27.5 
Layout for 
Two-Factor 
Design with 
Random 
Assignments of 
Factor A Level 
to Subjects and 
Repeated 
Measures on 
Factor B. 


In many two-factor studies, repeated measures can only be made on one of the two factors 
Consider, for instance, an experimenter who wished to study the effects of two types of 
incentives (factor A) on a person's ability to solve problems. The researcher also wanted to 
study two types of problems (factor B)—abstract and concrete problems. Each experimental 
subject could be asked to do each type of problem, but could not be exposed to more than 
one type of incentive stimulus because of potential interference effects. Thus, the design 
the experimenter utilized may be represented schematically as shown in Figure 27.5. 

In a two-factor experiment with repeated measures on one factor, two randomizations 
generally need to be employed. First, the level of the nonrepeated factor (A, in Figure 27.5) 
needs to be randomly assigned to the subjects. Second, the order of the levels of the repeated 
factor (B, in Figure 27.5) needs to be randomized independently for all subjects. 

Since s subjects are randomly assigned incentive stimulus A, and s subjects are randomly 
assigned incentive stimulus А», as far as factor A is concerned the experiment is a completely 
randomized one. On the other hand, as far as factor B (type of problem) is concerned, each 
subjectis ablock. Thus, for factor B, the experimentis a randomized complete block design, 
with block effects random. We call this experimental design a two-factor experiment with 
repeated measures on factor B. 

In the experiment depicted in Figure 27.5, comparisons between factor A level means 
involve differences between groups of subjects as well as differences associated with the 
two factor A levels. On the other hand, comparisons between factor B level means at the 
same level of factor А are based on the same subject, and hence only involve differences 
associated with the two factor B levels. Thus, for these latter comparisons, each subject 
serves as its own control. The main effects of factor A are therefore said to be confounded 


Treatment 
Ы Order 


Incentive Stimulus Subject 1 2 


АВ 


25 | АВ! АВ | 


Model 


Chapter 27 Repeated Measures and Related Designs 1141 


with differences between groups of subjects, whereas the main effects of factor B are free 
of such confounding. It is for this reason that tests on factor B main effects will generally 
be more sensitive than tests on the main effects for factor A. 


Comments 

1. A two-factor experiment with repeated measures on one factor may be viewed as an incomplete 
block design. With reference to the repeated measures design in Figure 27.5, there are four treatments 
(А, Bi, А, Bo, A;B,, and A5 B5) and one-half of the blocks (subjects) contain treatments A, B, and 
A, B; while the other half of the blocks contain treatments A; B, and A; B5. 

2. Whenthe factor on which repeated measures are taken is time, randomization of the levels of the 
repeated factor is impossible. Consider, for instance, a study of two different advertising campaigns 
in which the effect on sales is to be measured in 10 test markets during four consecutive months. 
Here, the only randomization required is for assigning the advertising campaigns to the test markets. 
Similarly, when the nonrepeated factor is a characteristic of the subject, such as age of subject, no 
randomization is involved for that factor. ш 


The development of a model for a two-factor experiment with repeated measures on one 
factor is only a little more complex than for earlier cases. As before, we shall develop the 
model for random subject effects and fixed factor A and factor B effects. Let, as usual, 
a; and В, denote the factor A and factor B main effects, respectively, (06) ;, the AB 
interaction effect, and p the subject (block) main effect. We do need to recognize, however, 
that the subject effect in this design is nested within factor A. Therefore, we will denote this 
effect by оуу. As before, we assume that there are no interactions between treatments and 
subjects, although this condition is not essential here. A model that incorporates the above 
specifications is as follows for a balanced study, where the number of subjects receiving 
each level of factor A is the same: 

Үк = p... + pic + ©; + Вк + (OB) jx + Eijk (27.11) 
where: 


ш--. is a constant 

pic are independent N (0, o7) 

a; are constants subject 10 У ^o; = 0 

Вк are constants subject to У `В, = 0 

(o. f) j are constants subject to? (е8) jk = O for all k апау, (03) jx == О for all j 
Ejk are independent N (0, o?) 

Юр and £;j are independent 

pmlogsnyeloeeaikzlcb 


The observations Y;;, forrepeated measures model (27.11) have the following properties: 


EY ijn} = p. + oj + бк + (Bog (27.12а) 
Чї} = оу = 02 t0? (27.12Ь) 
ciYgeYje)-o0? КФК (27.12с) 


o {Yik Yr je} =0 i Ф i’ and/or j + j (27.12d) 


1142 PartSix Specialized Study Designs 


Note that the observations Y;;, have constant variance. In addition, in advance of the rando 
trials any two observations for different levels of factor B for the same subject have со A 
covariance, for all subjects, while observations for different subjects are independent. 
all observations are assumed to be normally distributed. 

Once the subjects have been selected, repeated measures model (27.11) assumes that an 
two observations for the same subject are independent, that is, that there are no Шей, 
effects. 


nstant 
Also, 


Analysis of Variance and Tests 


Analysis of Variance. The ANOVA sums of squares for repeated measures model (27 1 1) 
can be obtained by means of the rules in Appendix D. The sum of squares that is used for 
estimating the error variance turns out to be the interaction sum of squares SSB.S(A). The 
ANOVA sums of squares are shown in Table 27.5. Also shown there are the degrees of 
freedom for each sum of squares. 


Tests for Factor Effects. "The expected mean squares for the analysis of variance in 
Table 27.5 are given in Table 27.6. These expected mean squares can be obtained by means 
of the rules in Appendix D. 

It is clear from the expected mean squares in Table 27.6 that the test for AB interaction 
effects: 


Но: all (o8) j = 0 


27. 
На: not all (@B) „ equal zero (27.132) 
uses the test statistic: 
pius BAM (27.13b 
7 MSB.S(A) 13D) 
TABLE 27.5 Analysis of Variance for Two-Factor Experiment with Repeated Measures on 
Factor B—Model (27.11). 
Source of Variation SS df 
Factor A SSA = bs y (Y. — К.) a-1 
j 
Factor B SSB — as УК. ENG b—i 
k 
AB interactions 55АВ = з ў V (Y, - Xj. - Y, Y.Y (a— 1)(Ь— 1) 
] [4 
Subjects (within factor A) SSS(A) — b У; – Үү.) a(s — 1) 
pw 
Error SSB.S(A) = 29539307 —-Yg-Yy-Yj a(s—1)X(b-1) 
i j k 
Total abs — 1 


55ТО=у у Yi- Y? 
i j k 


TABLE 27.6 
fxpected Mean 
Squares for 

T o-Factor 

Ў xperiment 
‘ith Repeated 


“Model (27.11) 
(AS B fixed, 
isübjects 
ѓардош). 


: ABinteractions 


' Subjects {within factor А) 


Chapter 27 Repeated Measures and Related Designs 1143 


'Soürce'of Variation 


3 
x 


-Fáctor-A 


factor-B 


Error 


and the decision rule for controlling the Type І error at o is: 


If F* < F[1—a;(a— 1)(b — 1), а(ѕ — 1)(b — 1)], conclude Ho 
If F* > F[1 – о; (а — 1)(b — 1), a(s — 1)(b — 1)], conclude H, 


The test for factor A main effects: 
Ho: allo j= 0 
На: not all œ; equal zero 


uses the test statistic: 


~_ MSA 
7 MSS(A) 


and the decision rule for controlling the Type I error at о is: 
If F* < ЕЦ — о;а — 1, a(s — D], conclude Ho 
If F* > ЕЦ – о;а — 1, a(s — 1)], conclude Н, 
Finally, the test for factor B main effects: 
Ho: all Вк = 0 
Нл: not all B; equal zero 


uses the test statistic: 


. . МВ 
7 MSB.S(A) 


and the decision rule for controlling the Type I error at о is: 


If F* < F[(1—o;b— 1, a(s — 1)(b — 1)], conclude Ho 
If F* > ЕП —o;b—1,a(s — D(b — 1)1, conclude H, 


(27.13c) 


(27.142) 


(27.14b) 


(27.14c) 


(27.15a) 


(27.15b) 


(27.15c) 


1144 


Part Six Specialized Study Desigus 


Comments 


1. When the assumption of compound symmetry in repeated measures model (27.] 1) is not inet 
the conservative test discussed in Comment 2 on page 1065 should be employed. " 

2. When the study is not balanced (1.e., when the number of subjects within each level of factor A js 
not the same), the tesis described here are no longer appropriate. Instead, the methods for unbalanced 


mixed and random effects models discussed in Section 25.7 can he employed. а 


Evaluation of Appropriateness of Repeated Measures Model 


Our earlier discussion on evaluating the appropriateness of a repeated measures model 
applies here also. The residuals for repeated measures model (27.11) are: 


ек = Yi — Yj — Y;j. T Xj. (27.16) 


A special feature of repeated measures model (27.1 1) also warrants attention. This model 
requires that the variance between subjects, о, be constant for all levels of factor А. This 
assumption can be examined by dot plots of the estimated subject effects Y;;. — Y į foreach 
level of factor A. 

We can also conduct a formal test of the equality of the between-subjects variances by 
noting that the variation between subjects within factor A, SSS(A), can be decomposed into 


components for each factor A level: 
SSS(A) = SSS(A;) + SSS(A2) + --- + SSS(Aq) (27.17) 
where: 


SSS(Aj) = Бу (Yi. Yay (27.172) 


Each component sum of squares has 7 — | degrees of freedom associated with it. We can 
therefore test the equality of the between-subjects variances by means of the Hartley test 
statistic (18.8) or the Brown-Forsythe test statistic (18.12). For the latter test, d; in (18.11) 
is defined as the absolute difference between the estimated mean, Y;.. and the median of 
the estimated means Y;.. .. . , Yaj.. 

Similarly, the error variation, SSB.S(A), can be decomposed into components for each 
factor A level: 


SSB.S(A) = SSB.S(A1) + SSB.S(A») + --- + SSB.S(Aq) (27.18) 
where: 


SSB.S(A;) = PPS Yije Ў Yi t YY (27.18a) 
i k т 


Each component has (s — 1)(b — 1) degrees of freedom associated with it. The Hartley or 
Brown-Forsythe tests can be conducted here also, this time to test for the equality of the 
error variance o for the different factor A levels. 

The Hartley test assumes normality and is sensitive to this assumption. Hence, the 
appropriateness of the normality assumption should be established first before the Hartley 
test is employed. Unlike the Hartley test, the Brown-Forsythe test is robust and relatively 
insensitive to departures from normality. 


m 


x 


Example 1 


Chapter 27 Repeated Measures and Related Designs 1145 


‘analysis of Factor Effects: Without Interaction 


When the two factors do not interact or the interactions are not important, the main effects 
may be analyzed in a straightforward fashion. The relevant mean square to be used in the 
estimated variance of an estimated contrast of factor A level means for repeated measures 
model (27.11) is MSS(A) because this mean square is the denominator of the appropriate 
F* statistic for testing factor A main effects. Similarly, the mean square for estimating 
contrasts of factor B level means is MSB.S(A). 

The multiples for the estimated standard deviation of an estimated contrast of factor A 
or factor B level means are as follows: 


Main A Effect Main B Effect 
Single comparison 
{1 — 0/2; a(s — 1)] ҢІ —a/2;a(s—1)(b—1)] (27.19a) 
Tukey procedure (for pairwise comparisons) 
[1 — a; a, a(s — 1)] T= spall —a;b,a(s—1)(b—1)] (27.196) 
Scheffé procedure 
52 = (a—1)F[1 — оза – 1, as — 1)] 
52 =(b—1)F[1 -—a;b—1,a(s—1)(b—1)] (27.19с) 


1 
a 
24 


Bonferroni procedure 
В = {І — o/2g; a(s — 1)] B=t[1—a/2g;a(s—1)(b—1)] (27.194) 


Note from Table 27.6 that the analysis of factor B effects сап be carried out more precisely 
than that for factor A effects. The reason is that comparisons among factor A levels utilize 
MSS(A), which involves the variability among the subjects as well as the experimental 
error, while comparisons among factor В levels utilize MSB.S(A), which involves only 
experimental error. 


A national retail chain wanted to study the effects of two advertising campaigns (factor A) 
on the volume of sales of athletic shoes over time (factor B). Ten similar test markets (sub- 
jects, S) were chosen at random to participate in this study. The two advertising campaigns 
(A; and Аз) were similar in all respects except that a different national sports personality 
was used in each. Sales data were collected for three two-week periods (B1: two weeks 
prior to campaign; B2: two weeks during which campaign occurred; B3: two weeks after 
campaign was concluded). The experiment was conducted during a six-week period when 
sales of athletic shoes are usually quite stable. 

The data on sales (coded) are presented in Table 27.7, and are plotted in Figure 27.6 
by test market for each advertising campaign. There is no evidence in Figure 27.6 of any 
interactions between the test markets and the treatments. In general, sales tended to increase 
during each advertising campaign, and then tended to decline to previous or lower levels 
than just before the campaign. 


1146 PartSix Specialized Study Designs 


TABLE 27.7 К 
Data—A thletic Advertising Test nme taiog 
Shoes Sales Campaign Market k= k=2 k=3 
Example. i=1 958 1,047 933 
i=2 1,005 1,122 986 
j=l i=3 351 436 339 
i=4 549 632 512 
і= 5 730 784 707 
і= 1 780 897 718 
і = 2 229 275 202 
ј= 2 21=3 883 964 817 
і = 4 624 695 599 
1 = 5 375 436 351 
FIGURE 27.6 (a) Campeign 1 (b) Campaign 2 
Plots of Sales 


Y; 


Data by Test nk Yi 
Market and 
Campaign— eee 
Athletic Shoes ооо 900 
Sales Example. 
700 pee UE 700 ок 


300 gc. 300 е 
— *^———_, 


0 ju» t Lt RA 0 LEER Ee rt du. 
1 2 3 1 2 3 
Period Period 


^ 


From Figure 27.6 and other diagnostic analyses (not shown), it was concluded that 
repeated measures model (27.11) is appropriate here. Figure 27.7 contains the MINITAB 
output for the fit of this model. 

First we wish to test for campaign-time interaction effects: 

Ho: all (08) jx =0 

Ha: not all (#8), equal zero 
We use the results from Figure 27.7 in test statistic (27.13b): 
__ MSAB 196 
С MSB.S(A) 358 


* 


= 55 


4GURE 27-7 
“AITAB 
бири for 
*NOVA— 
Athletic Shoes 


gales Example. 


Chapter 27 Repeated Measures and Related Designs 1147 
Factor Type Levels Values 
А fixed 2 1 2 
S(A) random 5 1 2 3 4 5 
B fixed 3 1 2 3 
Analysis of Variance for Y 
Source DF $$ MS F P 
A 1 168151 168151 0.73 0.417 
S(A) 8 1833681 229210 640.31 0.000 
B 2 67073 33537 93.69 0.000 
A*B 2 391 196 0.55 0.589 
Error 16 5727 358 
Total 29 2075023 71553 
Source Variance Error Expected Mean Square 
Component Term (using restricted model) 
1A 2 (5) + 32) + 15011] 
2 S(A) 76284.0 5 (5) + 3(2) 
3B 5 (5) + 10Q[3] 
4 A*B 5 (5) + 5Q[4] 
5 Error 358.0 (5) 
MEANS 
A N Y 
1 15 739.40 
2 15 589.67 
B N Y 
1 10 648.40 
2 10 728.80 
3 10 616.40 


For level of significance o = .05, we require F(.95; 2, 16) = 3.63. Since F* = .55 < 3.63, 
we conclude Ho, that no significant interaction effects are present. The P-value for the test 
is .59. 

Next we wish to test for advertising campaign main effects: 


Ho: alla; =0 


На: not all o; equal zero 
We use the results from Figure 27.7 in test statistic (27.14b): 


MSA _ 168,151 _ 


FAI LI ш, 
MSS(A) 229210 


73 

For level of significance a = .05, we require F(.95; 1, 8) = 5.32. Since F* = .73 < 5.32, 
we conclude Hp, that no advertising campaign main effects exist. The P-value for the test is 
.42. Thus, either of the two national sports personalities is equally effective in the advertising 
campaign. 


1148 PartSix Specialized Study Designs 


Finally, we wish to test for time period effects: 
Ay: all В; = 0 
Не: not all B, equal zero 
Using the results from Figure 27.7 in test statistic (27. 15b), we obtain: 


MSB 33,537 
MSB.S(A) 358 


For level of significance a = .05, we require F(.95;2, 16) = 3.63. Since F* = 93.3 > 3.63 
we conclude H,, that period main effects exist. The P-value for the test is 0+. | 

То examine the nature of the time period effects, we shall conduct pairwise comp 
of mean sales for the three time periods: 


LÀ 


arisons 


L = ue — Bee 


The Tukey procedure will be employed, with a 99 percent family confidence coefficient. 
We require: 


1 l 
T = —-4(.99;3, 16) = — 
J V2 
m 2MSB.S(A) 2(358) 
У} = = = 71.60 
SI as 2(5) 
Hence, Ts{L} = 3.38 /71.60 = 28.6. 
Тһе point estimates of the changes in mean sales, based on the estimated factor B level 
means Y.., in Figure 27.7, are: 


(4.78) — 3.38 


= 728.8 — 648.4 — 80.4 
= 616.4 — 648.4 = —32.0 
L3 = Кд — Кэ = 616.4 — 728.8 = —112.4 


о — Ёд 
753 Ё. 


and the desired confidence intervals therefore аге: 
52 < H-2 7 fey < 109 
—6l < ш.3 — H- <—3 


—141 X нз — H-2 < —84 


We conclude with family confidence coefficient .99 that the two advertising campaigns lead 
to an immediate increase in mean sales of between 52 and 109 (8 to 17 percent), but that 
mean sales in the following period fall below those for the period preceding the campaign 
by somewhere between 3 and 61 (.5 to 9 percent). 


Analysis of Factor Effects: With Interaction 


When interactions exist between the two factors, the analysis of factor effects becomes con- 
siderably more complex. As we saw in Chapter 19, page 848, when interaction effects are 
important, attention usually focuses on simple effects. To compare simple main effects of 
the repeated measure factor B, the appropriate error term for these pairwise comparisons 
remains MSB.S(A), the same as when there is no interaction. However, the appropriate 


Example 2 


Chapter 27 Repeated Measures and Related Designs 1149 


error term used for the pairwise comparisons of the simple main effects for factor A needs 
to be modified from that used without interaction in comparing main effects of factor A. For 
each level of factor B considered individually, the analysis reduces to a single-factor exper- 
iment in which there are no repeated measures. Hence, the mean square within treatments 
is the appropriate error term to make pairwise comparisons among the treatment effects 
within each level of factor B. This mean square is a weighted average of MSB.S(A) and 
MSS(A) where the weights are the corresponding degrees of freedom: 


a(b — 1)(s — 1)MSB.S(A) + a(s — 1)MSS(A) 


MS(Within Treatments) = ab(s — 1) 


Note that MS(Within Treatments) is a linear combination of mean squares whose expecta- 
tions are not necessarily the same. Stated differently, MS(Within Treatments) represents a 
pooling of what will often be heterogeneous sources of variability. 

To employ this error term as a basis for pairwise comparisons among the simple main 
effects, we employ the Satterthwaite procedure. The correspondences to (25.26) for L- 
MS(Within Treatments) are: 


= E _ а(—1)(5—1) _ a(s—1) 
MS, = MSB.S(A) М5, = MSS(A) C= ser Ce | Er D С = ae = D 


Substitution of these values into (25.28) leads to the Satterthwaite adjusted degrees of 
freedom: 


[SSB.S(A) + SSS(A)P 
[SSB.S(A)P [555 (А)]? 
а(ЪЬ— 1)(5—1)  a(s—1) 


We will now illustrate the analysis of factor effects in the presence of interactions with ап 
example. 


dfadj = (27.20) 


During exercise, blood flow increases in some parts of the body in response to metabolic 
demand. Using radioactive microspheres, an experiment was conducted to determine in 
which of five parts of the body (factor B) this occurs. Microspheres distribute in tissue 
as a function of blood flow; i.e., the greater the blood flow to a part of the body, the 
more microspheres (and radioactivity) it will contain. The experiment was designed to 
compare blood flow in five different parts of the body (factor B) between the resting control 
condition (factor А |) and during exercise (factor Az). Tissues were examined in the following 
parts of the body: bone, brain, skin, muscle, and heart. The experiment was conducted by 
injecting a total of eight rats (subjects) intravenously with radioactive microspheres. After 
the microspheres were injected, four rats were exercised on a treadmill for 15 minutes (factor 
Аз) and the other four rats were placed on the treadmill, but the treadmill was not turned 
on (factor A,). At the end of the 15-minute period, the rats were sacrificed and tissues in 
the five parts were harvested and the radioactivity in the tissues was measured. The data for 
this blood flow experiment are presented in Table 27.8 and plotted in Figure 27.8 by body 
part for each exercise condition, 

On the basis of Figure 27.8 and other diagnostic analyses (not shown), it was decided 
that repeated measure model (27.11) is appropriate here. Table 27.9 contains the analysis 
of variance table based on repeated measures model (27.11). 


1150 PartSix Specialized Study Designs 


TABLE 27.8 


Data—Blood a ык PEN Re e. 
Flow during Exercise К=1 k=2 k=3 k=4 k—5 
Exercise Condition (Bone) (Brain) (Skin) (Musde) (Heart) 
Example." (No Exercise) i=1 4 3 5 5 4 
ї=2 1 3 6 3 8 
j=l i-3 3 1 4 4 7 
}=4 1 4 3 2 7 
(Exercise) ji 3 6 12 22 11 
; 4.72 3 3 8 18 12 
j22 “т=з 4 7 10 20 14 
i=4 2 4 7 16 8 
* Adapted from FJ. Gordon, Апаѓузіх of Variance: Designs, Computations, and Multiple Comparisons. Department of Pharmacofogy, 
Emory University School of Medicine, 2003. 
Recaro Source of 
: Variation $$ df MS F* P-value 
Variance 
Table—Blood A 324.9000 1 324.9000 44.104 .0006 
Flow during S(A) 44.2000 6 7.3667 
Exercise B 389.5000 4 97.3750 49.936 .0000 
Example. AB 262.1000 4 65.5250 33.603 .0000 
B.S(A) 46.8000 24 1.9500 
Total 1067.5000 39 


FIGURE 27.8 Plot of Exercise Condition by Body Part for Each Rat—Blood Flow during Exercise Example. 


(a) No Exercise (Ai) (b) Exercise (A7) 
25 25 
20 20 
8 15 8 15 
E] me} 
810 810 
ca [га] 
5 5 
0 0 
p 1 ПЕЕ О 1 ка Мый 1 E. 
Bone Brain Skin Muscle Heart Bone Brain Skin Muscle Heart 


Body Part (B) Body Part (B) 


Chapter 27 Repeated Measures and Related Designs 1151 


First we wish to test for exercise by body part interaction effects: 
Ho: all (aB)j =0 
На: not all (06) з equal zero 
We use the results from Table 27.9 as the test statistic (27.182): 


rte MSAB _ 65.5250 _ = 33.603 
MSB.S(A) . “1.9500 
For level of significance a = .05, we require F(.95;4, 24) = 2.776. Since F* = 33.6 > 
2.776, we conclude H,, suggesting that interaction effects are present. The P-value for the 
test is 0+. 

Next, because of the presence of a strong interaction effect, we wish to compare simple 
main effects of the repeated measures factor B (body part). We shall conduct pairwise 
comparisons of mean blood flows among body parts separately within the exercise and no 
exercise conditions; namely, 


No Exercise Exercise 
Di = шл = Baz Diy = ро — [422 
D2 = bar = Шлз Рл = рәт — из 
D3 = шлі — ила Di3 = рол — {824 
Da = bai Bas Dig = рә — Ш25 
Ds = ba2— лз Dis = 1.22 — 1.23 
De = Ma2 — Шла Dig = [22 — {52А 
D; = шл2 – Has Diz = [22 — {525 
Dg = илз — Ила Dis = [4.23 — Ш.24 
Do = илз — Bas Dig = [4.23 — 1525 


Dio = Шла — Has Озо = H.24 — Ш25 


The Tukey procedure will be employed, with a 90 percent confidence coefficient, for each 
exercise condition. Then to combine these two Tukey procedures, a Bonferroni adjustment 
will be made for each exercise condition. Thus, we require 


1 4.17 
Т = —~q(.95;5, 24) = —— = 2.95 
5“ )=-% 
A 2MSB. 2(1.95 
pp ESOS. a ) . 975 


where .95 is used in the Т argument instead of .90 to incorporate the Bonferroni adjustment 
for the two conditions. Hence, Ts{D} = 2.954/.975 = 2.91. Table 27.10 lists the cell 
means by exercise group and body part. 

Any means within an exercise group that differ by more than 2.91 units are concluded 
to be significantly different from one another at the .10 level of significance. Therefore, 
for the no exercise group, heart is significantly different from bone, brain, and muscle. For 
the exercise group: heart is significantly different from bone, brain, and muscle; muscle is 
significantly different from bone, brain, skin, and heart; and skin is significantly different 
from bone, brain, and muscle. 


1152 Part 5іх Specialized Study Desigus 


TABLE 27.10 Treatment Means by Exercise Group and Body Part—Blood Flow duri 
Exercise Example. ng 


k=1 k=2 k=3 k=4 z 

(Bone) (Brain) (Skin) (Muscle) (Heart) 
j = 1 (No exercise) 2.25 2.75 4.50 3.50 6.50 
j = 2 (Exercise) 3.00 5.50 9.25 19.00 11.25 


To examine simple main effects of the nonrepeated measure factor A (exercise) for each 
level of B (body part), we shall conduct the five pairwise comparisons of mean blood flows 
between the two exercise groups within each body part; namely, 


Dy = fay — Шз 
Г» = илэ — Ыз 
Ру = илз — Hs 
Dy = ил — Hx 
Ds = pas — H25 
The Tukey procedure will be employed using a 95 percent confidence coefficient for each 


body part with a Bonferroni adjustment for the five body parts. The within-treatment sum 
of squares is 


SS(Within Treatments) = SSB.S(A) + SSS(A) = 46.8000 + 44.2000 = 91.0000 
The approximate Satterthwaite adjusted degrees of freedom from (27.20) are: 


[46.8000 + 44.2000]? — — 8281.0000 
(46.8000)? (44.2000)? ^ 416.8667 
2(4)(3) 2(3) 


ШЕ = = 19.86 


Being conservative, we use “Дш; = 19 associated with MS(Within Treatments), where 


91.0000 
MS(Within Treatments) — 20 = 3.033 
Thus, we require 
1 4.05 
T = —24(.99;2.19) = —— = 2.86 
5° 2 
рү = 2MS(Within Treatments) Е OD — 152 
5 


Hence, Ts{D} = 2.864/1.52 = 3.53. Any means within body parts that differ by more than 
3.53 units are significantly different from one another at the .10 level of significance. There- 
fore, we conclude that average blood flow for skin, muscle, and heart differ significantly 
between exercise groups. 


FIGURE 27.9 
Layout for 
Blocked 
Repeated 
Measures 
Design with 
Random 
Assignments of 
Factor A Level 
to Subjects and 
Repeated 
Measures on 
Factor B. 


Chapter 27 Repeated Measures and Related Designs 1153 


Treatment Order 


1 2 
AaB, A2B2 


Subject 1 
Block 1 
Subject 2 A, By AB, 
Subject 3 
Block 2 
Subject 4 
Subject 2n, — 1 
Block ль 
Subject 2ль 


Blocking of Subjects in Repeated Measures Designs 


As already noted, comparisons among factor B effects can usually be carried out with 
greater precision than those for factor A effects because the latter involve between-subject 
variability as well as experimental error. To improve the precision of factor А comparisons, 
it is often helpful to block the subjects by some appropriate characteristic(s) so that the 
subjects within a block are homogeneous. Figure 27.9 illustrates the blocking of subjects 
in connection with the repeated measures design of Figure 27.5. Altogether, np blocks 
are used, each consisting of two similar subjects. One subject in each block is assigned at 
random to factor level A,, the other is assigned to factor level А». In the second stage of 
randomization, each subject is randomly assigned the order of the two levels of factor B, 
namely, type of problem. Thus, the only difference between the repeated measures designs 
in Figures 27.9 and 27.5 is the blocking of the subjects for purposes of studying factor A 
effects more precisely. Note that for this layout, the number of subjects is s = 2ny. 

When there is a choice between which of the two factors should be the one on which 
repeated measures are taken (factor B), it should be the one for which more precise estimates 
are required. The reason is that even with blocking, the variability between subjects within 
a block will usually be greater than the variability within a subject. 


27.4 ‘Two-Factor Experiments with Repeated Measures 
on Both Factors 


In Section 27.2 we considered single-factor repeated measures studies. The model for these 
designs can be extended when the treatments follow a factorial structure. For example, 
consider a study where four treatments are employed that represent two levels of each of 
two factors. Figure 27.10 depicts the layout for such a design when four subjects are utilized 
in the study. Note that the order of the treatments is randomized within each subject. When 
the treatments represent a factorial structure, we can explore as usual interaction effects as 
well as the main effects for the two factors. The design in Figure 27.10 is said to represent 


1154 Part Six Specialized Study Designs 


FIGURE 27.10 
Layout for 
Two-Factor 
Repeated 
Measures 
Design with 
Repeated 
Measures on 
Both Factors 
(s=4,a =2, 
b = 2). 


Model 


Treatment Order 


1 2 3 4 
Subject 1 [^ В, А2В АВ 42B: | 
2 | АВ A Bj АВ, AB, 
a 
3 р? A, By A-B A, Bo 
_| 
4 [^ B, АВ! A, Bo A,B, 
d 


repeated measures on both factors because each subject receives all treatments defined by 
the factorial structure. 


When both factor effects are fixed, the subjects constitute a random sample, and there are 
repeated measures on both factors. a model frequently appropriate is given by: 


Yije = ш... pj +; + Bi + (OB) jx + (ро); + (0В): + Eijk (27.21) 


where: 


H- is a constant 

pi are independent N (0, o7) 

о} are constants subject to 5 œ; = 0 

Bı are constants subject to X` f, = 0 

(œB) j are constants subject to 25 (œp) = О for all k and >, (@B) к = 0 for all j 
(pf); are М (о zx subject to the restrictions 5 ^, (pf); = О for all i 


lx 
Opp lor k F к 


о{(0В). (PB) } = -7 


exl 2 . B 
(рог):; are N (o te) subject to the restrictions > (Pa); j = Ofor alli 


ЖОКЕ: 
c ((po));j. (ра); \ = -7a for j Æj 


pi. (ро) and (pf); are pairwise independent 


Note that two of the interaction terms in the model are random since the factor р; is à 


random effect and that all sums of effects over the fixed factor levels are zero. 


'The observations Y;;, for repeated measures model (27.21) have the following properties: 


Е{Үк} = ш. o + Be + (OB) jx (27.22a) 
> А -l , b—l, 
{Үн} = oy =O, + 2 «+ 4 Op to (27.22b) 


Chapter 27 Repeated Measures and Related Designs 1155 


Model (27.21) is an extension of the single-factor repeated measures model (27.1), where 
the treatment effect t; is now decomposed into factor A and factor B main effects and an 
AB interaction effect. However, separate first-order treatment-by-subject interaction terms 
are assumed to exist. 

Once the subjects have been selected, repeated measures model (27.21), like the earlier 
repeated measures model (27.1), assumes that all of the treatment observations for a given 
subject are independent—that is, that there are no interference effects. 


Analysis of Variance and Tests 

d Analysis of Variance. The ANOVA sums of squares for model (27.21) and the expected 
mean squares can be obtained readily by following the rules in Appendix D. The sum of 
squares for estimating the error variance terms reflects the interactions between treatments 
and subjects. Table 27.11 presents the ANOVA decomposition, degrees of freedom, and 
expected mean squares for two-factor repeated measures model (27.21). 


Tests for Factor Effects. It is clear from the expected mean squares column in 
Table 27.11a that the test for AB interaction effects: 


Ho: all (o B) jx =0 


(27.23a) 
Н: not all (06) ;; equal zero 
uses the test statistic: 
ёт ИВ (27.23b) 
MSABS 
and the decision rule for controlling the Type I error at о is: 
If F* < ЕП —o5(a — D(b — 1), (a - (b — 1)(5 — 1)], conclude Н 
< Е ( X ), (6 X X )] 0 (27.230) 
If F* > F[1 — о; (а — D(b — D, (а — D(b — Ds — 1)], conclude H, 
The test for factor A main effects: 
Ho: alla; = 0 
(27.24a) 
Ha: not all о; equal zero 
uses the test statistic: 
MSA 
F*LRL—— 27.24b 
MSAS ( ) 
and the decision rule for controlling the Type I error at о is: 
If F* < F[1— 05a — 1, (a — 1)(s — D], conclude М 
x FI ( X )] 0 (27.240) 
If F* > F[1—0;a — 1, (а — 1)(5 — D], conclude H, 
Similarly, the test for factor B main effects: 
Но: all f, = 0 
ка» (27.25а) 
Ha: not all В, equal zero 
uses the test statistic: 
MSB 
Е* = 2 (27.25b) 


1156 Part Six Specialized Study Desigus 


TABLE 27.11 
ANOVA Table 
and Sums of 
Squares for 
Two-Factor 
Repeated 
Measures 
Design with 
Repeated 
Measures on 
Both Factors— 
Subjects 
Random, 
Factors A and 
B Fixed. 


(a) ANOVA Table 
Source of 
Variation SS df MS E{MS} 
Subjects(S) SSS 5—1 MSS o? + abo? 
2 
Factor A 55А а—1 MSA 0? + bo? + bs 2 
-i 
Factor B SSB Ь-1 М5В 0? + a02, + P 
ABinteractions SSAB (о — 1Yb— 1) MSAB — o?4 s rw Yin 
(а 15-1) 
AS interactions SSAS (а — 1)(5 — 1) MSAS o? + bo?, 
BS interactions SSBS (b—1)(s — 1) MSBS о? + оо2, 
Error SSABS (а — T)(b—1)s—1) _ MSABS о? 
Total SSTO abs — 1 


(b) Sums of Squares 
555 = ab) "(v — x. 
i 
SSA = sb) (Y. – Y. 


SSB = sa у, — К.у? 

SSAB = 5337 -Yj — Ya + Y. 

SSAS = 3 DA -Y; 

SSBS = ОЗУ» С — Yi. — Ya Y)? 
mE 


SSABS = S У Y (пк Ynje — Yin — Ye Yes Y Ya — You? 
k 


P j 


—Y.j. + Y... Y 


and the decision rule for controlling the Type 1 error at o is: 
azb — l, (b — 1)(5 
æœ;b— 1, (р 1)( 


v 


(27.250) 


If F* < ЕП 
If F* > ЕІ 


1)]. conclude Ho 
1)]. conclude E, 


Comments 

l. When the effects of either factor A or factor В are random, the expected mean squares can be 
found by employing the rules in Appendix D. In turn, these expected mean squares will identify the 
appropriate test statistics. 

2. Conservative F tests described in Section 25.5 should be used when the assumption of com- 
pound symmetry in repeated mcasures model (27.21) is not met. 


Chapter 27 Repeated Measures and Related Designs 1157 


3. Repeated measures model (27.21) assumes that treatments and subjects interact. If treatments 
and subjects do not interact, it can be shown that the treatment by subject interaction sum of squares 
is made up of three components: 


SSTR.S = SSAS + SSBS + SSABS 


Thus, it is possible to pool the first-order interactions in the model (the factor A by subject interactions 
and the factor B by subject interactions) with the second-order interactions (the factor A by factor B 
by subject interactions). When the repeated measures model does not allow for interactions between 
treatments and subjects, the analysis of factor effects becomes somewhat easier. However, in many 
cases, MSABS tends to be considerably smaller than either MSAS or MSBS, justifying the use of 
separate error terms. ш 


Evaluation of Appropriateness of Repeated Measures Model 
Our earlier discussion on the evaluation of the appropriateness of repeated measures model 
(27.1) applies here as well. In particular, residual sequence plots by subject should be 
constructed to examine whether interference effects are present and whether the error vari- 
ance is constant. Plots of the observations by subject should be utilized to see whether the 
assumption of no treatment by subject interactions is appropriate. 


Analysis of Factor Effects 

If factors A and B do not interact or interact only in an unimportant fashion, the analysis 
of factor A and factor B main effects proceeds as usual. For the analysis of either factor A 
or factor B main effects, either MSAS or MSBS, respectively, will be used in the estimated 
variance of the estimated contrast since this mean square is the denominator of the F* test 
statistic for testing factor A or factor B main effects. 

The multiples for the estimated standard deviation of an estimated contrast of factor A 
or factor B level means are as follows: 


Main A Effect Main B Effect 
Single comparison 
t[1 — 2/2; (a— 1)(s — 1)] {1 —a/2;(b—1)(s—1)]  (27.26a) 
Tukey procedure (for pairwise comparisons) 
p JU Leola 16 — 9] T= JU o;b(b—1(s-1)] (27.26) 


Scheffé procedure 
52 =(a—1)F[1 —o;a— 1, (a— 1)(s – 1)] 
52 =(b—1)F[1 —o;b—1,(b—1)(s—1)]  (27.26c) 


Bonferroni procedure 
В = {1 —a/2g;(a—1)(s—1)] = {1 — 0/29;(Ь— 1)(5 – )1 (27.264) 


If strong interactions between factors А and B exist that cannot бе made unimportant by 
some simple transformation, the analysis of the factor effects should be performed in terms 
of the treatment means џ. зк, which are averaged over subjects. This analysis is similar 
to that in Section 27.3 for a two-factor study with interaction. The pooled mean square 


1158 PartSix Specialized Study Desigus 


Example 


TABLE 27.12 
Data—Blood 
Flow Example. 


MSTR.S will be used in estimating the variance of any estimated contrast of the tre 
means. The degrees of freedom associated with MSTR.S will need to be estimated us 
Satterthwaite procedure discussed before in Chapter 25, page 1043. 


atment 
ing the 


A clinician studied the effects of two drugs used either alone or together on the blood flow 
in human subjects. Twelve healthy middle-aged males participated in the study and th 
are viewed as a random sample from a relevant population of middle-aged males. The four 
treatments used in the study are defined as follows: 


A,B, placebo (neither drug) 
Ал B2 drug B alone 
Аз В drug A alone 


Аз Вә both drugs А and В 


The 12 subjects received each of the four treatments in independently randomized Orders, 
The response variable is the increase in blood flow from before to shortly after the ad- 
ministration of the treatment. The treatments were administered on successive days. This 
wash-out period prevented any carryover effects because the effect of each drug is short- 
lived. The experiment was conducted in a double-blind fashion so that neither the physician 
nor the subject knew which treatment was administered when the change in blood flow was 
measured. 

Table 27.12 contains the data for this study. A negative entry denotes a decrease in 
blood flow. Figure 27.11 contains the MINITAB output for the fit of repeated measures 
model (27.21). Included in the output are the expected mean squares for the specified 
ANOVA model. As explained in Chapter 25, each term in an expected mean square is 
represented in the MINITAB output by (1) the numeric code, in parentheses, for the variance 
of the model term and (2) the preceding number, which is the numerical multiple. When the 
model term is fixed, the letter Q is used in the printout to show that the variance is replaced 
by the sum of squared effects divided by degrees of freedom. For example, the expected 
value of MSA as shown in Figure 27.11 is: 


Zo? 
2-1 


which corresponds, of course, to the factor A expected mean square shown in Table 27.1 1a. 


(7) + 25) + 240[2] = о? + 207, + 24 


x Treatment 
Subject 
i A,B, A, B2 А Ві A; B; 
1 2 10 9 25 
2 —1 8 6 21 
3 0 11 8 24 
10 —2 10 10 28 
11 2 8 10 25 


12 —1 8 6 23 


TAB 
ink for 
ANOVA— 
Example. 


Chapter 27 Repeated Measures and Related Designs 


(8) MINITAB Output 


Analysis of Variance for Flow 


Source DF SS MS F P 

Subject 11 258.50 23.50 20.68 0.000 

A 1 1587.00 1587.00 775.87 0.000 

B 1 2028.00 2028.00 524.89 0.000 

A*B 1 147.00 147.00 129.36 0.000 

Subject*A 11 22.50 2.05 1.80 0.172 

Subject*B 11 42.50 3.86 3.40 0.027 

Error 11 12.50 1.14 

Total 47 4098.00 

Source Variance Error Expected Mean Square for Each Term 
Component Term (using restricted model) 

1 Subject 5.5909 7 ()+4() 

2A 5 (7) + 2(5) + 24Q[2] 

3B 6 (7) + 2(6) + 24Q[3] 

4 A*B 7 (7) * 12Q[4] 

5 Subject*A 0.4545 7 ()+2(5) 

6 Subject*B 1.3636 7 (7) + X6) 

7 Error 1.1364 (7) 


(b) SAS Output 


Sones [oF] apes 
GNE! 1587.000000 | 1587.000000 775.87 
кты [n] онн | эмиз | 


MEORE 
ип 2028.000000 | 2028.000000 524.89 


Error(b) @ 42.500000 3.863636 


Saree [oe] эю» 
ЕЕЕ 147.0000000 | 147.0000000 129.36 


Error(a*b) m 12.5000000 1.1363636 
E ENT 


Maximum 


0.5000000 2.1105794 — 2.0000000 4.0000000 
10.0000000 3.1908961 5.0000000 | 16.0000000 
8.5000000 2.0225996 6.0000000 | 12.0000000 
25.0000000 3.4377583 20.0000000 | 31.0000000 


1159 


1160 PartSix Specialized Study Designs 


FIGURE 27.12 
Interaction 

Plot with 
Responses 
Superimposed— 
Blood Flow 
Example. 


30 e 
$ 
25 в, 
20 
EET $ 
Е 
© 10 В, 
= $ 
5 
0 
5 az I 1 
Ат А 


Various diagnostics were utilized to see if repeated measures model (27.21) is appropriate 
for the data in Table 27.12. The results (not shown here) supported the appropriateness of 
this model. The clinician expected the two drugs to interact in increasing the blood flow. 
To test for interaction effects: 


Hq: all (a) j, = 0 


Не: not all (o8) ,, equal zero 
we use test statistic (27.23b) and the results from Figure 27.11: 


ж 


_ MSAB _ 147.000 
X MSABS 1.1364 


For level of significance o = .01, we require F(.99; 1, 11) = 9.65. Since F* = 129.36 > 
9.65, we conclude H,, that interaction effects exist. The P-value for this test is 0+. 

Figure 27.12 contains an interaction plot of the estimated treatment means, with the 
responses superimposed. Substantial interaction effects are evident. To study the nature of 
the interaction effects, the clinician wished to compare the joint use of the two drugs with 
the use of each drug alone, drug A with drug B, and each drug with no drug. Thus, the 
following pairwise comparisons are to be made: 


— 129.36 


Li = роз — ио La = рә — Ши 
Lo = gua — H-2 Ls = uaa — bets 
Із = pen — Ылә 


Point estimates of these pairwise comparisons are (Кн values are in Figure 27.11b): 


Ê, = 25.0 — 85 = 16.5 Ê, = 8.5 — .5 = 8.0 
Ê, = 25.0 — 10.0 = 15.0 Ê; = 10.0 — .5 = 9.5 
Ê, = 8.5 — 10.0 = —L5 


Chapter 27 Repeated Measures and Related Designs 1161 


The estimated variance of each estimate Ё is given in (17.22), with the relevant mean square 
here being MSABS. Hence, we have: 


PT 1 1 2 
541} = MSABS | – +— | = 1.1364 | — | = .1894 
S 5 12 


and s(£) = .435. Using the Bonferroni procedure with a 95 percent family confidence 
coefficient, we require B = t[1 — (.05)/2(5); 11] = £(.995; 11) = 3.106. Hence, 
t(.995; 11)s{£} = 3.106(.435) = 1.35 and the desired confidence intervals with a 95 per- 
cent family confidence coefficient are: 


15.15 < prog — иә < 17.85 6.65 < и.д — ил < 9.35 
13.65 < [22 — B) < 16.35 8.15 < Шл2 — Ш.п < 10.85 
—2.85 < цор — шло < —.15 


It is clear from these results that either drug A alone or drug B alone leads to an increase 
in blood flow, and that the combination of the two drugs leads to a substantial additional 
increase in blood flow as compared to when either drug is used alone. Finally, a significant 
difference exists in the mean effects of the two drugs used alone. 


Comments 

1. Repeated measures designs are discussed in more detail in References 27.1 and 27.2. 

2. In economics and econometrics, repeated measurement data over time are commonly referred 
to as panel data. The process of combining cross-sectional data and data over time to form a panel is 
called pooling. See References 27.3 and 27.4 for a discussion of these models and their analyses. 

3. Another area of application for repeated measurement data is referred to as growth curve model 
analyses. Here separate regression models are fit to each subject over time. See Reference 27.5 for a 
discussion of these models and their analyses. ш 


27.5 Regression Approach to Repeated Measures Designs 


When the repeated measures study is balanced and the treatment effects are fixed, the 
analysis of variance model can be expressed in the form of a regression model with indicator 
variables for purposes of obtaining the various sums of squares and conducting tests for 
treatment effects. Repeated measures models (27.1) and (27.21) can be stated in the form 
of aregression model as explained in Section 23.4 for randomized block designs. Repeated 
measures model (27.11), which also involves nested effects, can be expressed in the form 
of a regression model by including suitable indicator variables as explained in Section 26.6 
on page 1105. 

When the repeated measures study is not balanced, as, for instance, when there are 
missing observations, the tests based on the expected mean squares in Tables 27.1, 27.6, 
and 27.11 are no longer appropriate. Methods for analyzing unbalanced mixed and random 
effects models are discussed in Section 25.7. 


1162 Part Six Specialized Study Designs 


27.6 Split-Plot Designs 


Split-plot designs are frequently used in field, laboratory, industrial, and social Science 
experiments. The repeated measures design in Figure 27.5 for a study with repeated mea- 
sures on one factor is a type of split-plot design. We shall discuss split-plot desi gns only for 
two-factor studies, but these designs can be extended to apply when three or more factors 
are under investigation. 

Split-plot designs were originally developed for agricultural experiments. Consider an 
investigation to study the effects of two irrigation methods (factor A) and two fertilizers 
(factor B) on yield of acrop, using four available fields asexperimental units, In acompletely 
randomized design, four treatments (A, Bi, A; B», А> Bi, A2 B5) would then be assigned at 
random to the four fields. Since there are four treatments and just four experimental units 
there will be no degrees of freedom for estimation of error, as shown in the following 
abbreviated ANOVA table, listing source of variation and degrees of freedom only: 


Degrees 
Source of Variation of Freedom 
Factor A (irrigation methods) 1 
Factor B (fertilizer types) 1 
AB iriteractions 1 
Error 0 
Total 3 


If the fields could be subdivided into smaller experimental units, replicates of each 
factor-level combination could be obtained and the error variance could then be estimated. 
Unfortunately, in this investigation it is not possible to apply different irrigation methods 
(factor A) in areas smaller than a field, although different fertilizer types (factor B) could 
be applied in relatively small areas. A split-plot design can accommodate this situation. 

In a split-plot design, each of the two irrigation methods is randomly assigned to two 
of the four fields, which are usually called whole plots. In turn, each whole plot is then 
subdivided into two or more smaller areas called split plots, and the two fertilizers are 
then randomly, assigned to the split plots within each whole plot. The key feature of split- 
plot designs is the use of two (or more) distinct levels of randomization. At the first level 
of randomization, the whole-plot treatments are randomly assigned to whole plots; at the 
second level, the split-plot treatments are randomly assigned to split plots. 

The layout for the agricultural experiment example is shown in Figure 27.13. Note that 
this layout is conceptually identical to the layout for the two-factor repeated measures design 
in Figure 27.5. The fields in Figure 27.13 correspond to the subjects in Figure 27.5, and 
the split plots correspond to the occasions on which treatments can be applied to a subject. 
Consequently, the split-plot model here is the same as in (27.11): 


Үр = ш... + рар +æ; + Be (IB) ja + Eijk (27.27) 


For the split-plot agricultural experiment example, œ; denotes the main effect of the jth 
irrigation method (jth whole-plot treatment) and В; denotes the main effect of the kth 


TABLE 27.13 
ANOVA Table 
for Two-Factor 
Split-Plot 
‘Experiment. 


Chapter 27 Repeated Measures and Related Designs 1163 


FIGURE 27.13 Layout for Two-Factor Split-Plot Experiment—A gricultural Experiment 
Example (factor A is whole-plot treatment and factor B is split-plot treatment). 


Fields (Whole Plots) 


Whole-Plot Treatments 
| Source of Variation 55 ағ MS 

Whole:plots 

Factor A $SA а— 1 MSA 

Whole-plot error SSW(A) a(s — 1) MSW(A) 
Split plots 

Factor B SSB b—1 MSB 

AB interactions SSAB. (a— 1)(Б— 1) MSAB 

Split-plot error SSB.W( A) a(s — 1b — 1) MSB.W( A) 
Total 5570 abs = 1 


fertilizer type (kth split-plot treatment). Also, ру denotes the effect of the ith whole plot, 
nested within the jth level of factor A (irrigation method). 

Some computer packages produce special ANOVA tables that list the whole-plot effects 
and split-plot effects separately. Table 27.13 illustrates such a table. These tables serve as 
a reminder that the denominator of the F test for the whole-plot treatments is given by the 
error mean square for whole plots and that the denominator of the F test for the split-plot 
treatments and for the interactions between the whole-plot and split-plot treatments is given 
by the split-plot error mean square, as shown in Table 27.13. Note that this table is simply 
a rearrangement of the ANOVA table in Table 27.5 for a two-factor study with repeated 
measures on one factor. SSS(A) is now denoted by SSW(A) and SSB.S(A) is now denoted 
by SSB.W(A). The expected mean squares are the same as in Table 27.6. 


Comments 


1. Whenever subjects can receive all treatments in a two-factor study without interference effects, 
a repeated measures design with repeated measures on both factors might be preferable, because the 
factor effects for both factors may be estimated more precisely than in a split-plot design. 

2. Split-plot designs are useful in industrial experiments when one factor requires larger experi- 
mental units than another. Consider, for instance, a study of the effects of two additives (factor A) and 
two different containers (factor B) for prolonging the shelf life of a milk product. Here, it is easier to 
make larger batches of the milk product with a given additive, whereas the different containers can 
be used with smaller batches. 


1164 PartSix Specialized Studv Designs 


3. Splir-plot designs may be viewed as a type of incomplete block design where the who 
considered to be the blocks, with each whole plot being given only some of the full set of 
Incomplete block designs are discussed in Chapter 28. 


le plots аге 
treatments. 


4. A wide variety of split-plot designs has been developed. For instance, Split-plot designs 
involve more than two stages of randomization. In a split-split-plot experiment, three sta уы. 


R SA A - ges of ran. 
domization are generally involved. Whole plots are divided into split plots and split plots are m 
divided into split split plots. Three treatments are then assigned to the various levels of мач 


e 
units, using three distinct stages of randomization. References 27.2 and 27.6 provide further inform, 
s è a- 
tion about these designs. " 


Cited 
References 


27.1. Winer. B. J., D. R. Brown, and K. M. Michels. Statistical Principles in Experimental Design, 
3rd ed. New York: McGraw-Hill Book Co., 1991. 

27.2. Koch, G. G., J. D. Elashoff, and I. A. Amara. “Repeated Measurements—Design and Analysis? 
in Encyclopedia of Statistical Sciences, vol. 8, eds. S. Kotz and N. L. Johnson. New York: John 
Wiley & Sons, 1988, pp. 46-73. 

27.3. Pindyck, R. S., and D. L. Rubinfeld. Econometric Models and Economic Forecasts. 4th ed. 
Boston: Irwin/McGraw-Hill, 1998. 

27.4. Hsiao. С.. Analysis of Panel Data. Cambridge: Cambridge University Press, 1986, 

27.5. Graybill, Е. A., Theory and Application of the Linear Model. Boston: Duxbury Press, 1976. 

27.6. Dean, A., and D. Voss. Design and Analysis of Experiments. New York: Springer-Verlag, 1999, 


Problems 


27.1. A serious potential problem with repeated measures designs is associated with carryover 
effects. Describe some steps that can be taken to minimize this problem. 

27.2. In designing a two-factor repeated measures study with repeated measures on one factor, does 
it matter which of the two factors is included as the repeated measures factor? Explain fully, 

27.3. Blood pressure. The relationship between the dose of a drug that increases blood pressure and 
the actual amount of increase in mean diastolic blood pressure was investigated in a laboratory 
experiment. Twelve rabbits received in random order six different dose levels of the drug, with 
a suitable interval between each drug administration. The increase in blood pressure was used 
as the response variable. The data on blood pressure increase follow. 


Rabbit розе (/) Rabbit Dose (j) 

i 41 3 5 10 15 30 i 4 3 5 10 15 30 
1 21 21 23 35 36 48 71 4 12 19 22 33 40 
2. 19 24 27 36 36 46 8 20 20 30 30 38 4l 
3 12 25 27 26 33 40 9 18 18 27 3| 42 49 
4 9 17 18 27 34 39 10 8 12 H 24 26 3 
5 7 10 19 25 31 38 ll 18 22 25 32 38 38 
6 18 26 26 29 39 44 I2 17 23 26 28 3 35 


a. Obtain the residuals for repeated measures model (27.1) and plot them against the fitted 
values. Also prepare a normal probability plot of the residuals. What do you conclude 
about the appropriateness of model (27.1)? 

b. Prepare aligned residual dot plots by dose level. Do these plots support the assumption of 
constancy of the error variance? Discuss. 

c. Plot the observations Y;; for each rabbit in the format of Figure 27.2. Does the assumption 
of no interactions between subjects (rabbits) and treatments appear to be reasonable here? 


d. 


Chapter 27 Repeated Measures and Related Designs 1165 


Conduct the Tukey test for additivity, conditional on the rabbits actually selected; use 
о = .005. State the alternatives, decision rule, and conclusion. What is the P-value of the 
test? 


27.4. Refer to Blood pressure Problem 27.3. Assume that repeated measures model (27.1) is 
appropriate. 


21.5. 


21.6. 


a. 
b. 


Obtain the analysis of variance table. 

Test whether or not the mean increase in blood pressure differs for the various dose levels; 
use a = .01. State the alternatives, decision rule, and conclusion. What is the P-value of 
the test? 


. Analyze the effects of the six dose levels by comparing the means for successive dose levels 


using the Bonferroni procedure with a 95 percent family confidence coefficient. State your 
findings and summarize them by a suitable line plot. 


. According to the estimated efficiency measure (21.14), how effective was the repeated 


measures design here as compared to a completely randomized design? 


Refer to Blood pressure Problems 27.3 and 27.4. 


a. 


Develop a regression model in which the subject effects are represented by 1, —1, 0 
indicator variables and the dose effect is represented by linear, quadratic, and cubic terms 
inx — X — X, where X is the dose level. For instance, the x value for the first dose level 
(X = .1) is x = .1 — 1.07 = —.97. 


. Fit the regression model to the data. 
. Obtain the residuals and plot them against the fitted values. Does the model utilized appear 


to provide a reasonable fit? 


. Test whether or not the cubic effect is required in the model; use o = .05. State the 


alternatives, decision rule, and conclusion. What is the P-value of the test? 


Grapefruit sales. A supermarket chain studied the relationship between grapefruit sales and 
the price at which grapefruits are offered. Three price levels were studied: (1) the chief 
competitor’s price, (2) a price slightly higher than the chief competitor’s price, and (3) a price 
moderately higher than the chief competitor’s price. Eight stores of comparable size were 
randomly selected for the study. Sales data were collected for three one-week periods, with 
the order of the three price levels randomly assigned for each store. The experiment was 
conducted during a time period when sales of grapefruits are usually quite stable, and no 
carryover effects were anticipated for this product. Data on store sales of grapefruits during 
the study period follow (data coded). 


a. 


b. 


Price level (j) 


Store en o A Ia. 
i 1 2 3 
1 62.1 61.3 60.8 
2 58.2 57.9 55.1 
7 46.8 43.2 41.5 
8 51.2 49.8 47.9 


Obtain the residuals for repeated measures model (27.1) and plot them against the fitted 
values. Also prepare a normal probability plot of the residuals. What do you conclude 
about the appropriateness of model (27.1)? 

Prepare aligned residual dot plots by price level. Do these plots support the assumption of 
constancy of the error variance? Discuss. 


1166 Part Six Specialized Study Designs 


*27Л. 


27.8. 


*27.9. 


27-10. 


27.11. 


c. Plot the observations Y;; for each store in the format of Figure 27.2. Does the assumptio 
of no interactions between subjects (stores) and treatments appear to be reasonable here? 

d. Conduct the Tukey test for additivity, conditional on the stores actually selected; бес 
o = .01. State the alternatives, decision rule, and conclusion, What is the P-value of the 
test? 

Refer to Grapefruit sales Problem 27.6. Assume that repeated measures model (27.1) is 

appropriate, 

a. Obtain the analysis of variance table, 

b. Test whether or not the mean sales of grapefruits differ for the three price levels; Use 
о = .05. State the alternatives, decision rule, and conclusion, What is the P-value of the 
test? 

c. Analyze the effects of the three price levels by estimating all pairwise comparisons of 
the price level means. Use the most efficient multiple comparison procedure with a 95 
percent family confidence coefficient, State your findings and summarize them by a suitable 
line plot. 

d. According to the estimated efficiency measure (21.14), how effective was the repeated 
measures design compared to a completely randomized design? 


Refer to Blood pressure Problem 27.3. A consultant is concerned about the validity of the 
model assumptions and suggests that the study should be analyzed by means of the nonpara- 
metric rank F test. Rank the data within each rabbit and perform the rank F test; use œ = 0]. 
State the alternatives, decision rule. and conclusion. Comment on the consultant's concern 
here. 

Refer to Grapefruit sales Problem 27.6. It has been suggested that the nonparametric rank 
F test should be used here, Rank the data within each store and perform the rank F test; use 
о = .05. State the alternatives, decision rule, and conclusion. Is your conclusion the same as 
that obtained in Problem 27.7b? 

Truth in advertising. A consumer research organization showed five different advertisements 
to 10 subjects and asked each to rank them in order of truthfulness. A rank of 1 denotes the 
most truthful. The results were: 


Advertisement ( j) Advertisement ( /) 


Subject Subject 
i A B C D E 7 А В CD E 
1 3 1 2 5 4 6 4 2 1 3 5 
2 42 э]. 13-45 7 4 1 2 3 5 
3 4 2 3 1 5 8 5 1 3 2 4 
4 3 1 2 5 4 9 4 2 3 1 5 
5 4 12 5 3 10 5-1 2-3 4 


a. Do the subjects perceive the five advertisements as having equal truthfulness? Conduct 
the nonparametric rank F test using level of significance œ = .05. State the altematives, 
decision rule, and conclusion. What is the P-value of the test? 


b. Usethe multiple pairwise testing procedure (27.9) to group the five different advertisements 
according to mean perceived truthfulness: employ family significance level œ = 10. 
Summarize your findings. 

с. Obtain the coefficient of concordance (27.10) and interpret this measure. 


Incentive stimulus, Refer to the example in Section 27.3 about the effects of two types 
of incentives (factor A) on a person’s ability to solve two types of problems (factor В); 


12. 


Chapter 27 Repeated Measures and Related Designs 1167 


the repeated measures design is illustrated in Figure 27.5. Twelve persons were randomly 
selected and assigned in equal numbers to the two incentive groups. The order of the two 
types of problems was then randomized independently for each person. The problem-solving 
ability scores follow (the higher the score, the greater the ability to solve problems). 


Problem Type 
Incentive Abstract Concrete 
Stimulus Subject (К = 1) (к = 2) 
1= 1 10 18 
і= 2 14 19 
j=l i= 17 18 
i=4 8 12 
i=5 12 14 
1= 6 15 20 
i= 16 35 
і= 2 19 32 
J= 2 = 22 37 
i=4 20 33 
1= 5 24 39 
i=6 21 32 


a. Obtain the residuals for repeated measures model (27.11) and plot them against the fitted 
values. Also prepare a normal probability plot of the residuals. What do you conclude 
about the appropriateness of model (27.11)? 

b. Plot the problem-solving ability scores by incentive stimulus and problem type, in the 
format of Figure 27.6. What do you conclude about the appropriateness of model (27.11)? 
Discuss. 


Refer to Incentive stimulus Problem 27.11. Assume that repeated measures model (27.11) 
15 appropriate. 
a. Obtain the analysis of variance table. 


b. Plot the data and the estimated treatment means in the format of Figure 27.12. Does it 
appear that interaction effects are present? That main effects are present? 


c. Test whether or not the two factors interact; use a = .05. State the alternatives, decision 


rule, and conclusion. What is the P-value of the test? 


d. The following comparisons between problem types are of interest: 


Ly = ши — H Із = ua — |22 


Estimate these comparisons by means of confidence intervals. Use the Tukey procedure 
with a 90 percent family confidence coefficient for each problem type. Then combine these 
two Tukey procedures with a Bonferroni adjustment for each problem type. State your 
findings. 


e. The following comparisons between incentive stimuli are of interest: 


Із = pai — H-z Lg = ро — H-2 


Estimate these comparisons by means of confidence intervals. Use the Tukey procedure 
with a 90 percent family confidence coefficient for each incentive stimulus. Then combine 
these two Tukey procedures with a Bonferroni adjustment for each incentive stimulus. 
State your findings. 


PAZ ERA ENS чал: Rs M A OBEY SOLIS ROTTER AO OE 


НОННИ 


— 


— 


1168 Part Six Specialized Study Designs 


*27.13. Store displays. A repeated measures study was conducted to examine the effect: 
different store displays for a household product (factor A) on sales in four Succes 
periods (factor B). Eight stores were randomly selected, and four were assigned at r; 
each display. The sales data (coded) follow. 


*27.14. 


*27.15. 


а. 


S of two 
Sive time 
andom to 


Time Period 


Type of 

Display Store k=1 k=2 k=3 k=4 
i=l 956 953 938 1,049 

j=1 і= 2 1,008 1,032 1,025 1,123 
і = 3 350 352 338 438 
i=4 412 449 385 532 
i=1 769 766 739 859 

[=2 i=2 880 875 860 915 
i=3 176 185 168 280 
1-4 209 223 217 301 


Obtain the residuals for repeated measures model (27.11) and plot them against the fitted 
values. Also prepare a normal probability plot of the residuals. What do you conclude 
about the appropriateness of model (27.11)? 

Plot the sales data by type of display and time period, in the format of Figure 27.6. What 
do you conclude about the appropriateness of model (27.11)? Discuss. 


Refer to Store displays Problem 27.13. The experimenter wished to explore further the 
appropriateness of repeated measures model (27.1 1). 


a. 


Conduct a formal test of the constancy of the between-subjects variances. Use (27.17) and 
perform the Hartley test, with œ = .01. State the alternatives, decision rule, and conclusion. 
Decompose the error variation SSB.S( A) into components using (27.18), and perform the 
Hartley test for the constancy of the error variance с? for the different factor A levels; use 
о = .01. State the alternatives, decision rule, and conclusion. 


Refer to Store displays Problem 27.13. Assume that repeated measures model (27.11) is 
appropriate. 


a. 
b. 


Obtain the analysis of variance table. 

Plot the data and the estimated treatment means in the format of Figure 27.12. Does it 
appear that interaction effects are present? That main effects are present? 

Test whether or not the two factors interact; use œ = .025. State the alternatives, decision 
rule, and conclusion. What is the P-value for the test? 

Test separately whether or not display and time main effects are present; use о = .025 
for each test, State the alternatives, decision rule, and conclusion for each test. What is the 
P-value for each test? 

To study the nature of the factor А and factor B main effects, estimate the following 
pairwise comparisons: 


Use the Bonferroni procedure with a 90 percent family confidence coefficient. State your 
findings. 


27.16. 


27.17. 


*27.18. 


Chapter 27 Repeated Measures and Related Designs 1169 


Calculator efficiency. To test the efficiency of its new programmable calculator, a computer 
company selected at random six engineers who were proficientin the use of both this calculator 
and an earlier model and asked them to work out two problems on both calculators. One of 
the problems was statistical in nature, the other was an engineering problem. The order of 
the four calculations was randomized independently for each engineer. The length of time (in 
minutes) required to solve each problem was observed. The results follow (type of problem 
is factor A and calculator model is factor B): 


j-1 j-2 
Statistical Engineering 
Problem Problem 
k=1 k=2 k=1 k=2 
Engineer New Earlier New Earlier 
i Model Model Model Model 
1 Jones 3.1 7.5 2.5 5.1 
2 Williams 3.8 8.1 2.8 5.3 
3 Adams 3.0 7.6 2.0 4.9 
4 Dixon 3.4 7.8 2.7 5.5 
5 Erickson 3.3 6.9 2.5 5.4 
6  Maynes 3.6 7.8 24 4.8 


a. Obtain the residuals for repeated measures model (27.21) and plot them against the fitted 
values. Also prepare a normal probability plot of the residuals. What do you conclude 
about the appropriateness of model (27.21)? 

b. Prepare aligned residual dot plots by treatment ignoring the factorial nature of the treat- 
ments. Do these plots support the assumption of constancy of the error variance? Discuss. 


Refer to Calculator efficiency Problem 27.16. Assume that repeated measures model (27.21) 

is appropriate. 

a. Obtain the analysis of variance table. 

b. Plot the data and the estimated treatment means in the format of Figure 27.12. Does it 
appear that treatment interaction effects are present? 

c. Test whether or not the two treatment factors interact; use œ = .01. State the alternatives, 
decision rule, and conclusion. What is the P-value of the test? 


d. It is desired to study the nature of the interaction effects by considering the three compar- 
isons: 


Ly = ра — pn Із = L2 — Lı 
1» = fer — Wa 


Obtain confidence intervals for these comparisons; use the Bonferroni procedure with a 
95 percent family confidence coefficient. State your findings. 


Migraine headaches. Two experimental pain killer drugs for relief of migraine headaches 
were studied at a major medical center. Ten persistent migraine sufferers were randomly 
selected for a pilot study and received in random order each of the four treatment combina- 
tions, with a suitable interval between drug administrations. The decrease in pain intensity 
was used as the response variable. The four treatments used in the study are defined as fol- 
lows: A; Bı = low dose of both drugs; A; B; = low dose of drug A, high dose of drug B; 
A,B, = high dose of drug A, low dose of drug B; A; B; = high dose of both drugs. The data 


1170 Раг Six Specialized Study Designs 


*27.19. 


27.20. 


on reduction in pain intensity follow (the higher the score, the greater the reduction ў 
: n 
pain). 


Person Arce) А Cj = 2) 

і В. (k=1) B(k-2)  B(k-1) в (k=) 
1 1.6 3.4 2.7 43 

2 2.3 5.1 4.2 6.5 

3 4.2 5.3 4.6 6.0 

8 6.0 7.2 6.3 7.3 

9 12 1.4 1.3 { 

10 2.7 3.0 3.0 31 


a. Obtain the residuals for repeated measures model (27.21) and plot them against the fitted 
values. Also prepare a normal probability plot of the residuals. What do you conclude 
about the appropriateness of model (27.21)? 

b. Prepare aligned residual dot plots by treatment ignoring the factorial nature of the 
treatments. Do these plots support the assumption of constancy of the error variance? 
Discuss. 

Refer to Migraine headaches Problem 27.18. Assume that repeated measures model (27.21) 

is appropriate. 

а. Obtain the analysis of variance table. 

b. Plot the data and the estimated treatment means in the format of Figure 27.12. Does it 
appear that treatment interaction effects are present? That main effects are present? 

с. Test whether or not the two treatment factors interact; use œ = .005. State the alternatives, 
decision rule, and conclusion. What is the P-value of the test? 

d. Test separately whether or not factor A and factor B main effects are present; use œ = .05 
for each test. State the alternatives, decision rule, and conclusion for each test. What is the 
P-value for each test? 


€. Estimate the following comparisons by means of confidence intervals: 


Ly = pa = Ba Із = p-n — far 
15 = раз — pat Ly = роз = uan 


Use the Bonferroni procedure and family confidence coefficient .95. Summarize your 
findings. 


Wheat yield. Refer to the split-plot agricultural experiment of Section 27.6, for which the 
layout is shown in Figure 27.13. The results of this experiment to investigate the effects of 
two irrigation methods (factor A) and two fertilizers (factor B) on wheat yield follow for the 
10 fields used in the study. 


Irrigation Method j: 1 2 
Field 7: 1 2 3 4 5 1 2 3 4 5 
Fertilizer k = 1: 43 40 31 27 36 63 52 45 47 54 


k=2: 48 43 36 30 39 70 53 48 51 57 


27.21. 


Chapter 27 Repeated Measures and Related Designs 1171 


а. Obtain Ше residuals for split-plot model (27.27) and plot them against the fitted values. 
Also prepare a normal probability plot of the residuals. What do you conclude about the 
appropriateness of model (27.27)? 


b. Plot the wheat yield data by irrigation method and type of fertilizer in the format of 
Figure 27.6. What do you conclude about the appropriateness of model (27.27)? Discuss. 

Refer to Wheat yield Problem 27.20. Assume that split-plot model (27.27) is appropriate. 

a. Obtain the analysis of variance table. 

b. Plot the data and the estimated treatment means in the format of Figure 27.12. Does it 
appear that interaction effects are present? That main effects are present? 

c. Test whether or not the two factors interact; use œ = .05. State the alternatives, decision 
rule, and conclusion. What is the P-value for the test? 

d. Test separately whether or not factor A and factor B main effects are present; use œ = .05. 
State the alternatives, decision rule, and conclusion for each test. What is the P-value for 
each test? 

e. To study the nature of the factor A and factor B main effects, estimate the following 
pairwise comparisons: 


Li = ра. = H Lz = font — H-2 


Use the Bonferroni procedure with a 90 percent family confidence coefficient. State your 
findings. 


Exercise 


Projects 


27.22. 


27.23. 


27.24. 


27.25. 


27.26. 


Derive the total sum of squares breakdown in (27.5). 


Refer to Blood pressure Problem 27.3. Obtain the estimated within-subjects variance- 
covariance matrix using (27.8). Are the estimated variances and covariances of the same 
orders of magnitude? Is the compound symmetry assumption reasonable here? 

Refer to Grapefruit sales Problem 27.6. Obtain the estimated within-subjects variance- 
covariance matrix using (27.8). Are the variances and covariances roughly of the same order 
of magnitude? Is the compound symmetry assumption reasonably satisfied here? 

Refer to the Drug effect experiment data set in Appendix C.12. Consider only Part I of the 
study and observation unit 1 for each drug dosage level; i.e., include only observations for 
which variable 2 equals 1 and variable 6 equals 1. Treat the 12 rats as subjects and ignore the 
classification of the rats into the three initial lever press rate groups. Assume that the subjects 
(rats) have random effects and that the treatments (dosage levels) have fixed effects. 


a. State the additive repeated measures model for this study. 

b. Obtain the residuals and plot them against the fitted values. Also prepare a normal proba- 
bility plot of the residuals. What do you conclude about the appropriateness of the model 
employed? 

c. Plot the responses for each rat in the format of Figure 27.2. Does the assumption of no 
interactions between subjects (rats) and treatments appear to be appropriate? 

Refer to the Drug effect experiment data set in Appendix C.12 and Project 27.25. 

a. Obtain the analysis of variance table. 


b. Test whether or not the drug dosage level affects the mean lever press rate; use œ = .05. 
State the alternatives, decision rule, and conclusion. What is the P-value of the test? 


1172 PartSix Specialized Studv Designs 


21.21. 


21.28. 


27.29. 


d. 


Analyze the effects of the four dosage levels by comparing the mean responses for each 

pair of successive dosage levels: use the Bonferroni procedure with a 90 Percent Тапу 
К В ps i 

confidence coefficient. State your findings. y 


Fit a regression model in which the subject effects are represented by 1, —1, 0 indicato 
з РЕ : : 1 r 
variables and the dosage effect is represented by linear and quadratic terms in x = x — X 
where X is the dosage level. Assume that there are no interactions between subjects and 


treatments. 

Obtain the residuals and plot them against the fitted values. Does the regression model 
appear to provide a good fit? Discuss. 

Test whether or not the quadratic term can be dropped from the regression model; use 
о = .01. State the alternatives, decision rule, and conclusion. 


Refer to the Drug effect experiment data set in Appendix C.12. Consider the combined 
study. Assume that subjects (rats) and observation units have random effects, and that factor 
А (initial lever press rate), factor B (dosage level), and factor C (reinforcement schedule) have 
fixed effects. Also assume that there are no interactions between subjects and treatments, 


a. 
b. 


Use rules (D.1) and (D.6) in Appendix D to develop the model for this experiment, 

Fit the model in part (2), obtain the residuals, and plot them against the fitted values. 
Also prepare a normal probability plot of the residuals. What do you conclude about the 
appropriateness of your model? 


Refer to the Drug effect experiment data set in Appendix C.12 and Project 27.27. Assume 
that the model in Project 27.27a is appropriate. 


a. 


Use an appropriate statistical package to obtain the analysis of variance table and the 
expected mean squares. 


. Test whether or not ABC interactions are present; use œ = .01. State the altematives, 


decision rule, and conclusion. What is the P-value of the test? 


. For each reinforcement schedule, plot the estimated treatment means against dosage level 


with different curves for the three initial lever press rate groups, in the format of Figure 24.5. 
Examine your plots for the nature of the interaction effects and report your findings. 


Consider a repeated measures design study with s = 3 and r = 3, where each subject ranks 
all treatments (with no ties allowed). 


a. 


b. 


Develop the exact sampling distribution of Fẹ when Hy holds. [Hint: All ranking per- 
mutations for a subject are equally likely under Но and all subjects are assumed to act 
independently.] 

How does the 90th percentile of the exact sampling distribution obtained in part (a) compare 
with F(.90; 2, 4)? What is the implication of this? 


Chapter 


Balanced Incomplete Block, 
Latin Square, and 


Related Designs 


In this chapter we introduce balanced incomplete block and latin square designs. Incomplete 
block designs are block designs where the number of experimental units in each block 
is less than the number of treatment combinations. This is in contrast with randomized 
complete block designs, where each block contains a complete replicate of the experiment. 
A latin square design is a particular form of incomplete block design, where two blocking 
variables are employed to reduce experimental errors while requiring only a small number 
of experimental trials. 


28.1 Balanced Incomplete Block Designs 


In Chapter 15, we described the use of an incomplete block design in the context of a food 
product taste-testing experiment. In that example, the food manufacturer wished to assess 
consumer acceptance of five breakfast cereal formulations. The formulations differed in 
terms of the amount of sweetener to be used in the formulation. Products were to be rated 
оп а 10-point hedonic scale, and 12 consumers were available to rate the products. We noted 
that consumers differ considerably in their sensory perception of food products, and so it 
would be desirable to have each consumer rate all of the products. A randomized complete 
block (and repeated measures) design would result if each consumer were to rate all five 
of the formulations. However, consumers are generally unable to evaluate effectively more 
than three food products in a single session. With this restriction, the three tastings by any 
given consumer represent a single, incomplete block. 

In situations such as that just described for the taste-testing example, an effective ex- 
perimental arrangement can often be achieved using a balanced incomplete block design, 
or BIBD. An incomplete block design is balanced if every treatment appears with every 
other treatment in the same block the same number of times. For example, a candidate 
BIBD for the food product taste-testing experiment is shown in Figure 28.1. Note that there 
are np = 10 blocks, and that every treatment occurs together with every other treatment 


1173 


1174 Part Six Specialized Study Desigus 


FIGURE 28.1 
Balanced 
Incomplete 
Block Design 
for Five 
Treatinents 
and Block Size 
Three—Food 
Product 
Taste-Testing 
Example. 


Consumer Product Formulation | 
(Block) 1 2 3 " с | 
1 X X X 
2 X X X 
3 X X X 
4 X X X 
5 X X X 
6 X X X 
7 X X X 
8 X X X 
9 X X X 
10 X X X 


exactly three times. For example, formulations | and 2 appear together in blocks 1,2, and 3, 
Formulations | and 3 appear together in blocks І, 4, and 5—and so on. We shall use r; to 
denote the number of treatments in each block (or block size), л, to denote the number of 
times that pairs of treatments occur together in the same block, and л to denote the number 
of replicates of each treatment. Use of this design for the food product taste-testing example 
would mean that only 10 of the 12 available consumers could be used as subjects, because 
no balanced incomplete block design exists for r = 5 treatments. block size r, = 3, and 
number of blocks n, = 12. 

If there is no restriction on the number of blocks, a BIBD can be constructed for any 
incomplete block size гь (2 < гь < г) by listing all of the possible subsets of size гъ from 
the set of r treatments. The number of such subsets is: 

jet 

КЕ гь\(ко— гь)! (28.1) 
For example, the food product taste-testing example BIBD was constructed in this fashion. 
In this example, к = 5, ғ = 3, and the number of required blocks from (28.1) is n, = 
5/[31(5 — 3)!] = 10. A limitation of this simple approach is that the number of blocks 
required can be quite large, and there may be alternative BIBDs with the same number of 
treatments and block size requiring fewer blocks. For example, with = 8 and rp = 4, the 
number of blocks required is ть = 8!/(4!4!) = 70. but an alternative BIBD exists forr = 8 
and r, = 4 that requires just 2, = 7 blocks. 

A useful set of BIBDs is provided in Appendix B.15 for the combinations of treatments, 
block sizes, and numbers of blocks shown in Table 28.1. For example, the BIBD for the 
food product taste-testing example shown in Figure 28.1 corresponds to design number 4 
in Table 28.1. For this design, we have: 


p гь = З пь = 10 п = 6 п 


А more extensive listing of BIBDs is provided in Reference 28.1. 


E 
i 


E 


ABLE 28.1 
ced 
complete 
jock Designs 
provided in 


ppendix p.15. 


Chapter 28 Balanced Incomplete Block, Latin Square, and Related Designs 1175 


Number of ^ Block ^ Numberof Nümberof Treatment 
‘Design Treatments. Size. Blocks Replicates Pairings 
Number E Tb np (n Пр. 

1 4 2 6. 3 T 

2 3 4 3 2. 

3 5 2 10 4 1 

4 3 10 6 3 

5 4 5, 4 3 

6 6 2. 15 5 1: 

7 3 10 5 2 

8. 3 20, 10 4 

9 4. 15 10 6 

10 5 6 5 4 

11 7 2 21 6 1 

12 3 Z, 3 1 

13 4 7. 4 2 

14 6. 7 6 5 
15 8 2 28 7 1 
16 4 14 7 з 
17 7. 8 7 6 
18 9 3. 12 4 1 


Advantages and Disadvantages of BIBDs 


Pus 


Advantages of balanced incomplete block designs include: 


1. A BIBD layout enables an investigator to run an experiment when the size of the 
available blocks of experimental units is smaller than the number of treatments. This is 
particularly helpful when a large number of treatments are under study. 

2. Estimates of treatment effects have equal precision, and, as we shall see, the expres- 
sions for the variances of the estimated cell means and of contrasts of treatment means 
or effects are relatively simple. This simplifies the analysis and can facilitate sample size 
planning. 

3. The presence of balance permits the use of the Scheffé and Tukey procedures for 
the analysis of treatment effects. These procedures cannot be used if an incomplete block 
design is not balanced. 


Disadvantages of balanced incomplete block designs include: 


1. As we have noted, balanced incomplete block designs exist only for certain combi- 
nations of numbers of treatments, block sizes, and numbers of blocks. Investigators may 
be compelled to adjust one or more of these parameters—i.e., by eliminating treatments, 
available blocks, or available experimental units—so that the available BIBD can be imple- 
mented. 'This may lead to a design that is balanced and relatively easy to analyze, but does 
not achieve fully the objectives of the study. 

2. The assumption that there are no interactions between the blocking variable and the 
treatments is restrictive. 


1176 Part Six Specialized Study Designs 


FIGURE 28.2 MINITAB Regression Results—Food Product Taste-Testing Example. 


Predictor 
Constant 
71 

72 

73 

74 

75 

76 

77 

z8 

z9 

x1 

x2 

x3 

x4 


Analysis of Variance 


Source 


Regression 
Residual Error 


Total 


(a) Model (28.3) 


Coef 
6.1667 
0.1222 

—0.5667 
1.2556 
—1.7222 
1.4333 
—0.9222 
0.3667 
—0.8111 
—1.1667 
—1.6000 
1.1333 
1.6000 
0.6667 


DF 
13 
16 
29 


(b) Regression Results for Model (28.4) 


SE Coef T P Predictor Coef SE Coef 


0.1639 37.63 0.000 Constant 6.1667 0.3249 1898 0.0 : 
0.5130 0.24 0.815 71 0.5000 0.9747 Qs Rs. 
0.5130  —1.10 0.286 z2 —0.5000 0.9747 -0.51 aie 
0.5130 245 0.026 z3 0.5000 0.9747 051 M 
0.5130 -3.36 0.004 24 —1.5000 0.97479  —154 fae 
0.5330 279 0.013 75 0.8333 0.9747 085  Q4p 
0.5130  —1.80 0.091 76 —1.8333 0.9747 —188 0.075 
0.5130 0.71 0.485 z7 1.5000 0.9747 154 013 
0.5130 -1.58 0.133 78 -0.5000 09747 -—-051 сем 
0.5130 -227 0.037 79 -1.1667 0.9747  —120 0245 
0.3590  —4.46 0.000 | ; 
0.3590 316 0.006 Analysis of Variance 
0.3590 446 0.000 Source DF SS MS F 
0.3590 1.86 0.082 Regression 9 46833 5204 164 0.179 
Residual Error 20 63.333 3467 
Total 29 110.167 
SS MS F P 
97.2778 7.4829 9.29 0.000 
12.8889 0.8056 
110.1667 
(c) Regression Results for Model (28.5) 
Predictor Coef SE Coef T P 
Constant — 64667 0.2662 23.16 0.000 
х1 —1.6667 0.5325 -з313 0.004 
х2 1.0000 0.5325 1.88 0.072 
x3 1.8333 0.5325 344 0.002 
x4 0.3333 0.5325 0.63 0.537 
Analysis of Variance 
Source DF SS MS F P 
Regression 4 57.000 14.250 6.70 0.001 
Residual Error 25 53.167 2.127 
* Total 29 110.167 


3. The analysis of a balanced incomplete block design is more complex than the analysis 
of a randomized complete block design. As we will see in Section 28.2, treatment and block 
effects are not orthogonal in BIBDs, and so the analysis is carried out using the regression 
approach. 


We now turn to the statistical analysis of BIBDs, including the development of tests 
for treatment and block effects, and the analysis of factor-level effects. The analysis of a 
balanced incomplete block design is similar to the analysis of a randomized complete block 
design with missing cells, which was discussed earlier in Chapter 23. 


КЫ 


Chapter 28 Balanced Incomplete Block, Latin Square, and Related Designs 1177 


Comment 


When no BIBD exists for the desired number of treatments, number of blocks, and block size, some 
statisticians recommend the use of designs that are nearly balanced. Computer-based methods for 
constructing nearly-balanced incomplete block designs, available in statistical software packages 
such as JMP, are discussed in Reference 28.2. Related designs, called partially balanced incomplete 
block designs, have also been developed, a number of which are listed in Reference 28.1. The use 
of unbalanced incomplete block designs leads to a more complex analysis. For example, as already 
noted, the Scheffé and Tukey multiple comparisons procedures cannot be used with these designs for 
the analysis of treatment means. ш 


98.2 Analvsis of Balanced Incomplete Block Designs 


iBIBD Model 


The model for a balanced incomplete block design is the same as that for a randomized 
complete block design. Thus either model (21.1) for fixed block effects, or model (25.67) 
for random block effects may be employed. The analysis of variance is the same for these 
two models, and all tests and estimates of treatment effects are conducted as for fixed block 
effects. For this reason we shall present only the fixed block effects case. Model (21.1) is: 


Y; = u.. + р + tj + eij (28.2) 


where: 


ш.. is a constant 

p; are constants for the block (row) effects, subject to the restriction $ |р; = 0 
ту are constants for the treatment effects, subject to the restriction? уту = 0 
&;; are independent N (0, o?) 

Uel;smnsjeLl:sr 


Note that model (28.2) assumes that no block-treatment interactions are present. 

In Section 23.4, we discussed the analysis of randomized complete block designs when 
one or several observations are missing. This discussion is relevant to the analysis of BIBDs, 
because there are r —rp missing cells in each block. We noted there that missing cells destroy 
the orthogonality of the complete block design and make the usual ANOVA calculations 
inappropriate. However, the regression approach, as described on page 967, is still appro- 
priate for additive model (28.2). Since no new principles are involved, we turn now to the 
use of the regression approach for the food product taste-testing example. 


, Regression Approach to Analysis of Balanced Incomplete Block Designs 


For the food product taste-testing example, the regression model equivalent to block design 
model (28.2) is as follows, where we use Xs to denote the 1, 0, —1, indicator variable 
predictors corresponding to treatment effects т through t4 and Zs to denote analogous 
predictors corresponding to block effects p; through po: 


Yi = u.. + py Žiji Mee + poZijo + t Xii Tee + т Ха + ij Full model (28.3) 


1178 PartSix Specialized Study Desigus 


TABLE 28.2 Responses and Predictors—Food Product Taste-Testing Example. 


0 0 Q A (0) © M (9 (9 (10) (11) (12) (13у (ay 
i і Yy Zn Zg Zip Zi 215 Züge 2 2в Zi Xp Xi Xip j A 
poog M D c Oe ди a crgo уур a 
| 2-6 CA. ®@, c0. “0 20€ 000 зб. МО. у CO. 45. x E 
їз в 1 о о о о о о о © о o у & 
EEG E0009 0 M 
2 2 7 0 1 0 0 0 0 0 0 0 0 1 0 M 
2 4 7 о 1 о о о о о о о о о ọọ "ui 
3 1 6 0 о 1 о 0 0 0 0 0 1 0 0 о 
3 2 в 0 40 1 0 оо о о 0 0 1 0 6; 
3 5 6 0 ?0 1 o 0 0 о 0 0 -1 -1 1 4 
iom qoos. ый nep rede шш ш шр, ОО Б 
10 a 9 1 1 1 1 1 -1 -1 +1 0 Do oa 
10 5 6 A = 1 1 1 1 1 1 1 1 j^ xp 24 
FIGURE 28.3 Residual Plots—Food Product Taste-Testing Example. 
(a) Residuals vs. Predicted (b) Normal Probability Plot 
Ф e 
1 ө Ф e 1 e 9 
© e 
Ф é e od 
= > Ls 
2 0 Ф & E ® Ф Ф 2 о um 
E e ` e E d 
% o Е "Ыы 
-1 ө Ф —1 o? 
e Ф 
Ф Ф 
МЕЕ 2 SS ee ees | LL 6 i S i Ea; | 
3 4 5 6 7 8 9 10 -2 -i 0 1 2 
Predicted Value Expected Value 
where: 


А 1 if response from product k (i.e., if ј = К), fork = 1,2, 3,4 
Хк = 4 —1 if response from product 5 
O0 otherwise 
1 if response from subject k (i.e., if = k), fork = 1,...,9 
Zig = q —1 ifresponse from subject 10 
O otherwise 


т 


А portion of the data for the food product taste-testing BIBD is shown in Table 28.2. 
The response vector Y is displayed in column 1, Z;jı through 2; are shown in columns 2 
through 10, and X;;; through X;;, are shown in columns 11 through 14. MINITAB regression 
output for the initial fit of model (28.3) is shown in Figure 28.2a. These results were obtained 
by regressing column 1 in Table 28.2 on columns 2 through 14. Residuals obtained from 
this fit are plotted against predicted values in Figure 28.3a and a normal probability plot 
of these residuals is shown in Figure 28.3b. No violations in assumptions are suggested 


Chapter 28 Balanced Incomplete Block, Latin Square, and Related Designs 1179 


by the residual plots. The correlation between the residuals and the expected values under 
normality in Figure 28.3b is .988, which supports the assumption of approximate normality 
of the residuals. 

Testing for the presence of treatment effects and block effects is carried out in the usual 
manner by first fitting full model (28.3) and then fitting each of the following reduced 
models: 

Test for Treatment Effects 


Yı; = u.. + pi Ziji o poZij + ё} Reduced model (28.4) 
Test for Block Effects 
Ү = u.. +O Xi +5 Xia + i Reduced model (28.5) 


Regression results for these two reduced models are shown in Figures 28.2b and 28.2c, 
respectively. We first consider the test for treatment effects. 
The alternatives in the test for treatment effects implied by full model (28.3) and reduced 
model (28.4) are: 
Но: тү = ту = 13 = ц = 0 


(28.6) 
Ha: not all т 20 


Using general linear test statistic (2.70) and results from Figures 28.2a and 28.2b, we have: 
_ SSE(R) — SSE(F) 


Е* = = + MSE(F 
dle- üf: ib 
63.333 — 12.8889 
= 15.65 


For о = .05, we require F(.95;4, 16) = 3.01. Since 15.65 > 3.01, we conclude H, that 
treatment effects are present. The P-value of the test is 0+. 

In similar fashion, a test for block effects is obtained using full model (28.3) and reduced 
model (28.5). In this case, the alternatives are: 


Н A = Z9. шш = 0 
0: Pi = 2 f» (28.7) 
Ha: not all p; = 0 
and test statistic (2.70) is, using results from Figures 28.2a and 28.2c: 


| SSE(R) — SSE(F) 


Е* = + MSE(F) 
dfr — dfr ( 
53.167 — 12.8889 
= = .8056 
25 — 16 xs 
= 5.56 


For о = .05, we require F(.95,9, 16) = 2.54. Since 5.56 > 2.54, we conclude H,, that 
block effects are present. The P-value of the test is .0015. 

At this point we have demonstrated that there are significant differences among the 
treatment means and that the use of blocking was effective. We now turn to the analysis of 
treatment effects for balanced incomplete block designs. 


1180 PartSix Specialized Study Designs 


Analysis of Treatment Effects 


Example 


Once the presence of treatment effects has been established using the regression appr 


the analysis of these effects proceeds as described in Section 21.5 for randomized пе 
block designs. with the following modifications: Pete 
І. The least squares estimate of the jth treatment mean џ.; is given by: 

hj = Ё. + 6 (28.8) 


where 2... and €; are the least squares estimates of the regression coefficients и. and q, 
in (28.3). Note that the least squares estimate of the ith treatment mean is not given bei 
by Ү.; 

2. исап be shown that the variance of a contrast of estimated treatment means (or effects)is: 


297) 22 - ^ _ 2 fb - э 
o{L}=o (Sens) =0 nou (28.9) 


3. The estimated variance of a contrast of treatment means or effects is obtained by substi- 
tuting the estimated variance MSE(F) for full model (28.2) for o? in (28.9): 


r 


SÊ = MSE(F) ~~ N 6 (28.10) 


np j=! 
4. The error degrees of freedom аге now пт — (np — 1) — (r — 1) — 1 = nyrg — ny —r 4-1. 


The multiples for the estimated standard deviation of an estimated treatment mean or treat- 
ment contrast are then as follows: 


1 
Tukey procedure (for T = =4{1 – а; ғ. п —ny = ғ + l] (28.11а) 
pairwise comparisons) v2 
Scheffé procedure S? = (л = ЕП оғ = 1, п — np —r + 1] (28.116) 
Bonferroni procedure B = t[(1—o/2g;nyry — np —r + 1] (28.11c) 


We illustrate the use of the Tukey procedure for the food product taste-testing example. 


The least squares estimates of the five treatment means listed below were obtained using 
(28.8) and the regression results in Figure 28.2a: 


j: l 2 3 4 5 
йн аз 130 777 683 43 


А 
For example. the first estimated cell mean is fi.. + ĉ = 6.17 + (—1.60) = 4.57. These 
estimated treatment means are plotted against treatment number ( j) in Figure 28.4. Note that 
the treatments 2, 3, and 4 lead to the largest estimated mean responses, and that treatments 
1 and 5 appear to be substantially smaller. The investigators utilized the Tukey procedure 
to obtain all pairwise comparisons, employing a 95 percent family confidence coefficient. 

For the food product taste-testing example, we have r = 5, np = 10, np = 3. and, from 
Figure 28.2a, MSE(F) = .8056. The estimated variance of the estimated difference between 


СОКЕ 28.4 
- timated 
ifyeatment 

- Means 

* plot —Food 


" ' Wste-Testing 
.  pample. 


Chapter 28 Balanced Incomplete Block, Latin Square, and Related Designs 1181 


10 


Estimated Treatment Mean 


юм ш d» щл A N 0 o 


= 


1 2 3 4 5 
Treatment 


cell means 1 and 2, D = fi.) — fin, using (28.10) is: 


ap = Dye 
50б) = М5Е(Е)——`у сл 


P j=1 


3 
= 8056 0 T CY + 0? + 02 +0) = .3222 


Using (28.112), we find for a 95 percent family confidence coefficient: 
1 


1 
Т = —=4(.95;5, 16) = 
а( ) i 


4.33) =3. 
Ji (4.33) — 3.06 


Hence: 
Ts{D} = 3.064/.3222 = 1.74. 
We now obtain all pairwise comparisons using (17.30) with fi; —4.57, (1.5 — 7.30, 
Ё.з = 777, Ё. = 6.83, апа fi.5 = 4.37: 
—4.47 = (4.57 — 7.30) — 1.74 € wy — wz < (4.57 — 7.30) + 1.74 = —0.99 
—4.94 = (4.57 — 7.77) — 1.74 € ш. — из € (4.57 — 7.77) + 1.74 = —1.46 
—4.00 = (4.57 — 6.83) — 1.74 < wr — u.4 < (4.57 — 6.83) + 1.74 = —0.52 
—1.54 = (4.57 — 4.37) — 1.74 € п. — ш.5 < (4.57 — 4.37) + 1.74 = 1.94 
—2.21 = (7.30 — 7.77) — 1.74 € u.2 — из (7.30 — 7.77) + 1.74 = 1.27 
—1.27 = (7.30 — 6.83) — 1.74 € ш. — u.4 < (7.30 — 6.83) + 1.74 = 2.21 
1.19 = (7.30 — 4.37) — 1.74 < н. — js < (7.30 — 4.37) + 1.74 = 4.67 
—0.80 = (7.77 — 6.83) — 1.74 < из — fg < (7.77 — 6.83) + 1.74 = 2.68 
1.66 = (7.77 — 4.37) — 1.74 € u.3 — js < (7.77 — 4.37) + 1.74 = 5.14 
0.72 = (6.83 — 4.37) — 1.74 € wg — u.s < (6.83 — 4.37) + 1.74 = 4.20 


1182 PartSix Specialized Study Designs 


We conclude that the five treatment means cluster into two distinct groups. The three 
largest estimated means corresponding to treatments 2. 3, and 4 are significantly differen, 
from treatment means ! and 5, but not significantly different from each other, and the two 
smallest estimated treatment means (for treatments | and 5) are not significantly different 
from each other. A line plot of the estimated treatment means summarizes the results: 


4 2 3 


5 1 
Á 4 bt \ \ 


I— 


ә ® 
+ 1 -l- 
5.5 6.5 75 


Consumer Acceptance Score 


Planning of Sample Sizes with Estimation Approach 


Example 


The essence of this approach is to specify the major comparisons of interest and to determine 
the expected widths of the confidence intervals for various sample sizes, given an advance 
planning value of the standard deviation. For a given number of treatments r and block size 
ry. We need to determine the number of blocks п, required to achieve confidence intervals 
of a specified width. We then determine if a BIBD exists for number of treatments r and 
block size ғ, that has approximately the required number of blocks. In doing so, we will 
utilize the following two relations that hold for any balanced incomplete block design: 


гп = куп 
ny(r — 1) = nt, — 1) 
From these relations we have: n, = rn/ry and n, = (ry — 1)/(r — 1). 


We illustrate the estimation approach to the planning of sample sizes based on Tukey's 
pairwise comparison procedure and the taste-testing example. 


Suppose that Tukey's method for all pairwise comparisons will be used to analyze the BIBD 
for the food product taste-testing example with r = 5 and r, = 3. Assume that o will be 
no larger than 1.0 and the widths of the simultaneous 95 percent confidence intervals are 
not to exceed 2.0. In a BIBD the widths of all such intervals are the same, since the Tukey 
multiple 7 is the same for all pairs and since, from (28.10), s* (D) = 2MSE(F)ry/ (rn,). 
Using the fact that n, = rn/ry = 5n/3, the error degrees of freedom аге: 
df, = пьњ — F — np + 1 
5n 10r 
= п — 5 ~ — + l = — – 4 
' 3 3 
The Tukey multiple comparison confidence limits for all pairwise comparisons D, = Ш. + 
p.p аге: 
б, X Tc(Dbj] 
where об) =20°(3)/(5п,) from (289) and T=( 1/4/2)91.95; 5, 10/3 — 4]. 
Furthermore, since n, = n(rp — 1)/(r — 1) = n(3 — 1)/(5 — 1) = n/2. we obtain: 
ACA 6074 120? 
5(2)п 5п 


Chapter 28 Balanced Incomplete Block, Latin Square, and Related Designs 1183 


Therefore, the confidence interval halfwidth is: 


With o? = 1, the only unknown is n. We need to determine n so that To(Dj) < 1.0 or 
n > 1,20q?[.95; 5, 10n/3 — 4]). Using Table B.9, we find by trial and error that n must be 
greater than or equal to 24. For r = 5 and rp = 3, note that the number of replicates for 
design 4 in Table 28.1 is n = 6. Therefore, the required number of replicates is achieved 
by repeating this particular BIBD four times, for which n = 24 and n, = 40. 


Comment 


It is also possible to use the power approach or to use the selection of the “best” treatment approach 
to plan sample sizes. See Reference 28.3 for a discussion of sample size planning using the power 
approach. и 


28.3 Latin Square Designs 


Basic Ideas 


We saw in Section 21.6 that two blocking variables can be used simultaneously in random- 
ized complete block designs to eliminate from experimental error the variation associated 
with each of the blocking variables. For instance, the blocking variables might be age and 
income of subject, with a block containing subjects in a given age and income group. 

However, the full use of two blocking variables in a complete block design often requires 
too many experimental units. For instance, if the age and income variables in the illustration 
have six classes each, 36 blocks would be required. If six treatments were to be studied, 
216 subjects would be needed for the experiment. Cost considerations may not permit the 
use of 216 experimental units, yet precision and range of validity considerations may require 
the simultaneous use of two blocking variables, each with six classes, in order to reduce the 
experimental error variance sufficiently and to have a reasonable variety of experimental 
subjects. In this type of situation, a latin square design may be helpful. 


Taking incomplete block designs to the extreme in our example, given the employment of 
36 blocks, the number of experimental units is minimized if only one treatment is run in 
each block. This extreme case, where each block contains only one treatment, is the type of 
situation for which a latin square design is appropriate. Table 28.3 provides an illustration 
of the difference between complete and incomplete block designs for the example con- 
sidered. Column 1 shows the complete block design for this case, while columns 2 and 3 
illustrate incomplete block designs, with three treatments and one treatment in each block, 
respectively. 

There is another reason, besides economy, why a latin square design with only one 
treatment per block is used, namely, that blocks sometimes cannot contain more than one 
treatment. Consider the repeated measures design discussed in Section 27.2 where each 
subject receives every treatment. The repeated measures model in (27.1) assumes that no 
interference effects due to order position are present. If, indeed, such effects are possible, 
it may be desirable to use the order position as another blocking variable. Thus, “subject” 


1184 Part Six Specialized Study Designs 


TABLE 28.3 
(1) (2) (3) 
Complete and 
Incomplete Block Incomplete py 
Incomplete Desi È ock 
í esign (three Design (o 
Block Designs. ne 
Complete Block treatments treatment 
Block Description Design per block) per block) 
Age under 25, income Th, Тә, Тз, 1, Тз, Ts Т, 
under $10,000 Ta, Ts, T6 
Age under 25, income n, To Т, Th, Ta, Т6 Ts 
$10,000-$19,999 Ta, Т5, Te 
Age 25-34, income Th, Т, Тз, T2, Ta, Ts Тз 
ипдег $10,000 Ta, Ts, Te 
Age 35-44, income n, Тә, Ts, Тз, Та, Te T, 
under $10,000 Ta, Ts, 16 
etc. etc. etc. etc. 


would be one blocking variable and “order position of treatment” a second blocking variable 
Blocks would then be defined as follows for a study involving six treatments: 

Block |: Subject l, position | 

Block 2: Subject 1, position 2 


Block 6: Subject І, position 6 
Block 7: Subject 2, position | 
etc. etc. 


Notice that the blocks so defined can contain only one treatment, since the order position 
refers to the place of a single treatment in the sequence of treatments for a subject. 


Description of Latin Square Designs 
Let A, B, C represent three treatments; it is conventional with latin square designs to use 
Latin letters for the treatments. Suppose that day of week (Monday, Tuesday, Wednesday) 
and operator (1, 2, 3) are to be used as blocking variables. A latin square design might then 
be shown as follows: 


Operator 
Day 1 2 3 r 
Monday B A C 
Tuesday A C B 
Wednesday C B A 


Operator | would run treatment B on Monday, treatment A on Tuesday, and treatment C on 
Wednesday, and so on for the other operators. Note that each operator runs each treatment, 
and that all treatments are run on each day. 


Chapter 28 Balanced Incomplete Block, Latin Square, and Related Designs 1185 


A latin square design thus has the following features: 


. There are r treatments. 
. There are two blocking variables, each containing r classes. 
. Each row and each column in the design square contains all treatments; that 1s, each 


class of each blocking variable constitutes a replication. 


Advantages and Disadvantages of Latin Square Designs 
Advantages of a latin square design include: 


1. 


2. 


The use of two blocking variables often permits greater reductions in the variability of 
experimental errors than can be obtained with either blocking variable alone. 
Treatment effects can be studied from a small-scale experiment. This 1s particularly 
helpful in preliminary or pilot studies. 


. It 1s often helpful in repeated measures experiments to take into account the order 


position effect of treatments by means of a latin square design. 


Disadvantages of a latin square design are: 


1. 


2. 


The number of classes of each blocking variable must equal the number of treatments. 
This restriction is often difficult to meet in practice. 

The assumptions of the model are restrictive (e.g., that there are no interactions be- 
tween either blocking variable and treatments, and also none between the two blocking 
variables). 


. The use of a latin square design will lead to a very small number of degrees of freedom 


for experimental error when only a few treatments are studied. On the other hand, when 
many treatments are studied, the degrees of freedom for experimental error may be larger 
than necessary. 


. The randomization required is somewhat more complex than that for earlier designs 


considered. 


Because ofthe limitations on the degrees of freedom for experimental error just described, 


latin squares are rarely used when more than eight treatments are being investigated. For the 
same reason, when there are only a few treatments, say, four or less, additional replications 
are usually required when a latin square design is employed. 


Randomization of Latin Square Design 
There exist many latin squares for a given number of treatments. For example, for r — 3, 
there are altogether 12 different possible arrangements. Four of the 12 possible arrangements 
are (we omit the row and column blocking variable labels): 


C > 
TOU, 
со > б 
AnD 
оъ |м 
OO о 
BAD 
Сус» | 
со СУ 
со > б 
ъъ сус 
C > 


1186 PartSix Specialized Study Designs 


Example 


The number of possible latin square designs increases rapidly as the number of treatments 
gets larger; for r = 5, there are 161.280 possible arrangements. 

The objective of randomization is to select one of all possible latin squares for the given 
number of treatments г, such that each square has an equal probability of being selected 
Clearly, it is not generally feasible to list all possible latin squares so that one can be selected 
at random. Instead, we utilize standard latin squares, which are latin squares in which the 
elements of the first row and the first column are arranged alphabetically. The earlier latin 
square | is a standard latin square. Table B.14 contains all the standard squares for у — 3 
and 4, and a single selected standard square for = 5, 6, 7. 8, and 9. 

The randomization procedure usually employed with Table B.14 is as follows: 


1. For к = 3, independently arrange the rows and columns of the standard Square at 
random. 

2. For r — 4, select one of the standard squares at random. Then, independently arrange 
its rows and columns at random. 

3. For r — 5 and higher, independently arrange the rows, columns, and treatments of the 
given standard square at random. 


It can be shown that this procedure selects one latin square at random from all possible 
squares for r — 3 and 4. For r — 5 or higher, the randomization procedure is not based on 
all possible latin squares, but rather on very large and suitable subsets thereof. 


An experiment was conducted to study the effects of different types of background music 
on the productivity of bank tellers. The treatments were defined as various combinations of 
tempo music (slow, medium, fast) and style of music (instrumental and vocal, instrumental 
only). The treatments and Latin letter designations were as follows: 


Latin Letter 


Treatment Designation Tempo and Style of Music 
1 A Slow, instrumental and vocal 
2 B Medium, instrumental and vocal 
3 C Fast, instrumental and vocal 
4 D Medium, instrumental only 
5 E Fast, instrumental only 


Table 28.4 contains the results of this experiment. The treatment in each cell is shown 
in parentheses. The experimental unit in this study is a working day for the crew of bank 
tellers: the productivity data pertain to the performance of the entire crew. Let У; denote 
the observation in the cell defined by the ith class of the row blocking variable and the 
jth class of the column blocking variable. The subscript k indicates the treatment assigned 
to this cell by the particular latin square design employed. For instance, Yi»3 = 17 is the 
productivity on Tuesday of the first week, and Table 28.4 indicates that the type of music 
on that day was C. 


Еч 


Chapter 28 Balanced Incomplete Block, Latin Square, and Related Designs 1187 


FABLE 28.4 Latin Square Design and Experimental Results—Background Music Example (productivity of 


ain LES DAS n бем 


i irew— 


ta coded). 


Day 
M T | m Mean 
18 (D) 17 (C) 14 (A) 21 (8) Ñ.. = 17.4 
13 (Cy 34 (В) 21 (Е) 16 (AY Ү..-= 19.8 
(A) 29(D) 32°(B) 7 (E), Ys. 21.6 
17 (E) 13 (A) 24. Yá.. = 22.0 
21 (В) 26 (E) ,264D) Ys.. —222 
Yr = 15.2 Y2. = 23.8 Үз. = 23.4 Ү,. = 20.6 
Кйь=114 Y= 238 
Y32266. Y3-21.6 
К.з-= 19.6. | 


The subscript k in У, ‚к is actually redundant for a latin square design because the row and 
cell designation (i, j) determines the treatment for the particular latin square employed. 
However, we continue to use all three subscripts for ease of identification. 

We shall analyze the results of this study in Section 28.5. 


28.4 Latin Square Model 


A latin square design model involves the main effect of the row blocking variable, denoted 
by о, the main effect of the column blocking variable, denoted by к;, and the treatment 
main effect, denoted by ту. It is assumed that no interactions exist between these three 
variables. Thus, the model employed is an additive one. For the case of fixed treatment and 
block effects, the model is: 


Ук = p.p Kj Tu Eijk (28.12) 
where: 
ш... is a constant 
Pi» Kj, t are constants subject to the restrictions Ую; =} уку => уту = 0 
ек are independent N (0, o?) 
з=] с Ту] жнт шлу 


Note again that the number of classes for each of the two blocking variables is the same as 
the number of treatments, and that the total number of experimental trials is r?. 


Comment 


If the treatment effects are random, the only change in model (28.12) is that the ту, now are independent 
N(O, 02) and are independent of the £j- ш 


1188 Part Six Specialized Study Designs 


28.5 Analysis of Latin Square Experiments 


Notation 
We shall employ the usual notation for row, column, and treatment totals and means: 


Y= Ми = (28.13a) 
i 


ox! 
I 


1 
з= Yn = (28.13b) 
- Ya 
Y., = 5 И Y. = pr (28.13c) 
ij 


The overall total and mean are denoted as usual by: 


ЕК 
rea у Лк Ee a (28.13d) 
i j 


Note the redundancy of any one of the three subscripts, arising from the fact that the 
treatment is uniquely determined by the row and column specifications for the latin square 
utilized. The various means for the background music example are shown in Table 28.4. 
The estimated treatment means are calculated by first collecting the data for each treatment 
and then averaging these values. For instance, we have: 


Л+13+14+16+7 
р ———————— 
5 


т<! 


= 11.4 


Fitting of Model 
The least squares and maximum likelihood estimators of the parameters in latin square 
model (28.12) are: 


Parameter Estimator 

pi fe. = Y. (28.142) 
pi Ё -Yn.-Y (28.14b) 
Kj ky = ү. Y. (28.14€) 
Tk t = Yn — Y. (28.14d) 

The fitted values therefore are: ; 
fiy = Y. + Yj- + Y, —2У.. (28.15) 

and the residuals are: 

eijk = Yig — Yi = Yije — Yi- — Ку — Ya + 2XY. (28.16) 


Analysis of Variance 


Table 28.5 presents the ANOVA table for latin square model (28.12). The sums of squares 
can be obtained by the rules in Appendix D, remembering that one subscript is redundant 


H 


Chapter 28 Balanced Incomplete Block, Latin Square, and Related Designs 1189 


*ABLE 28.5 ANOVA Table for Latin Square Design Model (28.12) with Fixed Effects. 


узы 


iurce of Variation SS df MS E{MS} 
2 
Г 2 Pi 
gćking variable SSROW ps MSROW — s row c? + = | 
s CO, DKF 
alumn-blocking variable SSCOL r-1 MSCOL = Бе c? +r P i 
" i 
reatments SSTR. г—1 MSTR = E c? + > : 
| А SSRem 
Ка a = $ a = Se 2 
afl: SSRem (г — 1)(г — 2) MSRem Grd о 
‘Stal SSTO r?—1 


The definitional forms of the sums of squares are as follows: 


SSTO = Y Y Oi - X (28.17a) 
SSROW =r 3 Ed (28.17b) 
SSCOL = г SE, - Y. (28.17c) 

SSTR —r Уу, –Ү..) (28.179) 

к 
(28.17е) 


SSRem = УУ Oir — Y. — Ў. — Y, + 2¥..)? 
i j 


SSROW is the row sum of squares. The more the row means Ү,.. differ, the larger is 
SSROW. Similarly, SSCOL is the column sum of squares and measures the variability of the 
column means Y. j.- SSTR denotes, as usual, the treatment sum of squares. Finally, SSRem 
stands for the remainder sum of squares reflecting the error variability. We use this notation 
here since this sum of squares is made up of several different interaction components. 

The degrees of freedom in Table 28.5 can be understood as follows. There are r? obser- 
vations, and hence SSTO has r? — 1 degrees of freedom associated with it. Since there 
are r classes for the row and column blocking variables each, and also r treatments, 
each of the corresponding sums of squares has r — 1 degrees of freedom associated with 
it. The number of degrees of freedom associated with SSRem is the remainder, namely, 
(r? — 1) — 3(r — 1) = (r — D(r — 2). Note that the addition of a second blocking variable 
has reduced the number of degrees of freedom for the error sum of squares from (r — 1)? for 
a randomized complete block design based on r blocks and r treatments to (r — 1)(r — 2), 
a reduction of r — 1 degrees of freedom. 

The E(MS] column in Table 28.5 for latin square model (28.12) can be obtained by using 
the rules in Appendix D, remembering that one subscript is redundant, or by a computer 
package that provides expected mean squares. 


1190 PartSix Specialized Study Designs 
Test for Treatment Effects 
To test for treatment effects in latin square model (28.12) with fixed effects: 


Но: ап Ty = 0 
(28.182) 


Ha: not all т, equal zero 


we see from the E(MS) column in Table 28.5 that the appropriate test statistic is: 


„_ МТК 
~ MSRem (28.18b) 
The appropriate decision rule to control the risk of a Type I error at o is: 
If F* < Е —o;r—l,(r — 1)(к — 2)], conclude Hy 
(28.18c) 


If F* > F(1—o;r—l,(r — D(r —2)], conclude Ha 


Comments 


l. If the presence of blocking variable effects is to be tested, we see from the E{MS} column in 
Table 28.5 that the appropríate test statistics are: 


MSROW 
F” = 
MSRem (28.19a) 
»  MSCOL 
p MSRem (28.19b) 
2. If the treatment effects are random, the alternatives to be considered are: 
ай (28.20 
Н o? >0 "2 


but the test statistic and decision rule are the same as in (28.18) for the fixed treatment effects 
case. L| 


Analysis of Treatment Effects 
When differential treatment effects are found by the analysis of variance and the treatments 
have fixed effects, estimates of contrasts involving the treatment effects are usually desired, 
often utilizi ng multiple comparison procedures. The appropriate mean square to be used in 
the estimated variance of the contrast is MSRem obtained from (28.176), and the multiples 
for the estimated standard deviation of the contrast are as follows: 


Single comparison t[1 — 0/2; (r — I(r —2)] (28.21a) 

Tukey procedure (for T = ——4(1— a;r, (r * DO — 2)] (28.21Ь) 
pairwise comparisons) v2 

Scheffé procedure 52 = (r DFU- arl, (r— 1—2] (28219 

Bonferroni procedure B = t[1 0/28; (r — D(r —2)] (28.21d) 


(g comparisons) 


Chapter 28 Balanced Incomplete Block, Latin Square, and Related Designs 1191 


Residual Analysis 


£xam p! e 


TABLE 28.6 
ANOVA 
Table— 
Background 
Mnsic 
Example. 


The use of the residuals in (28.16) for examining the aptness of a latin square model presents 
no new issues; the basic points made earlier for other designs apply also to latin square 
designs. The Tukey test for additivity in a randomized complete block design, discussed in 
Section 21.4, can be extended to latin square designs. Reference 28.3 describes the extension. 


The analysis of variance calculations for the background music data in Table 28.4 were 
made by using a computer package and the results are shown in Table 28.6. The residuals 
were also obtained and analyzed. Figure 28.5a contains a plot of the residuals against the 
fitted values, and Figure 28.5b contains a normal probability plot of the residuals. These 
plots do not reveal any serious departures from the model assumptions, though they show 
one case that appears to be outlying. The Bonferroni outlier test, explained on page 396, 
was employed to test whether this case is an outlier but did not identify it as such. Based 
on these and other diagnostics, including the Tukey test for additivity, it was concluded that 
model (28.12) is appropriate for the data. 
To test for treatment effects: 


Но: тү = ту = GB = 1 = 15 = 0 


Ha: not all % equal zero 


Source of Variation SS df MS 
Weeks: 82.0 4 20.5 
Days.within week 477.2 4 119.3 
Type of music 664.4 4 166.1 
Error ‚ 188.4 12 15.7 
Total 14120 24 


FIGURE 28.5 Diagnostic Residual Plots—Background Music Example. 


Residual 


10 


a) Plot against Ў (b) Normal Probability Plot 
y 


Residual 


20 30 40 —10 —5 0 5 10 
Fitted Value Expected Value 


1192 Part Six Specialized Study Designs 


we find from Table 28.6: 

_ MSTR 166.1 

. MSRem 15.7 

To control the risk of making a Type I error at œ = .01, we require Ё(.99; 4, 12) = 541 

Since F* = 10.6 > 5.41, weconclude H,, that the various types of background music have 

differential effects on the productivity of the bank tellers. The P-value of this test is 0007. 
Pairwise comparisons between the different kinds of music were desired with a family 

confidence coefficient of .90, using the Tukey procedure. Substituting into (17.14) with 

n; = ny =r and using MSRem from Table 28.6 as the mean square, we obtain: 

2MSRem  2(15.7) _ 

r EE ES 


ж 


= 10.6 


sÊ} = 628  s(£]—2.51 


Remember that each estimated treatment mean Y., is based on five observations here. Next, 
we require the 7 multiple in (28.21b): 

1 
092) = 2.77 


1 
Т = —~q(.90;5, 12) = 
zal ) `8 


5 


so that: 
Ts{L} = 2.77(2.51) = 6.95 


Conducting pairwise tests based on the confidence intervals, the treatments can be placed 
into three groups: 


Group 1 Group 2 Group 3 
Music2 — Y.2— 26.6 Миѕіс4  Y.4 = 23.8 Musici К: = 11.4 
Music 4 Y.,— 23.8 Music 5 Y. = 21.6 
Music 5 Y. = 21.6 Music3 Үз = 19.6 


The most promising treatment appears to be mixed instrumental-vocal music in medium 
tempo (k = 2). There is clear evidence that it is better than instrumental-vocal music in 
slow tempo (k = 1) or instrumental-vocal music in fast tempo (k = 3). The point estimates 
suggest it also is better than solely instrumental music in medium (k = 4) or fast (k = 5) 
tempo, but the experimental evidence on these latter two comparisons is inconclusive. 


Factorial Treatments 


If the treatments in a latin square design are factorial in nature, the treatment sum of squares 
SSTR is decomposed in the usual manner. For a two-factor*experiment involving factors A 
and B, we have: 


SSTR = SSA + SSB + SSAB (28.22) 


Estimates of fixed factor effects can be made readily since they are simply contrasts of 
the treatment means. 


Chapter 28 Balanced Incomplete Block, Latin Square, and Related Designs 1193 


Random Blocking Variable Effects 
If the row and/or column blocking variable(s) in a latin square design have classes that 
should be viewed as random selections from a population, the fixed effects latin square 
model (28.12) needs to be modified in the usual fashion. The analysis of variance is the 
same as for the fixed blocking variable effects model and all tests and estimates of treatment 
effects are conducted as for fixed blocking variable effects. 


Missing Observations 

While missing observations destroy the symmetry (orthogonality) of the latin square design 
and make the usual ANOVA calculations inappropriate, the regression approach ordinarily 
remains appropriate when observations in a latin square design are missing. We just set up 
the regression model for the available observations and then fit the model to the data. The 
procedure is analogous to that discussed in Section 23.4 for complete block designs. Tests 
are conducted by fitting the full and appropriate reduced regression models. Estimation of 
fixed treatment effects is done in terms of the regression coefficients for the full model in 
the usual manner. 


20.6 Planning Latin Square Experiments 


Power of F Test 


The power of the F test for treatment effects in latin square model (28.12) involves the 
noncentrality parameter: 


1 
ф= = Уз (28.23) 


with degrees of freedom r — 1 for the numerator апа (r — 1)(r — 2) for the denominator. 
Other than these modifications, no new issues are encountered in obtaining the power of 
the test for treatment effects in a latin square design. 


Necessary Number of Replications 
A latin square design provides r replications for each treatment. Power and/or estimation 
considerations similar to those for randomized complete block designs may indicate that 
r replications are too few, particularly when r is small, say, 3, 4, or 5. Two methods of 
increasing the number of replications with a latin square design are discussed in Section 28.7. 
With either method, it is necessary to assess in advance the magnitude of the experimental 
error variance o? in order to plan the necessary number of replications. 


Efficiency of Blocking Variables 
The efficiency of a latin square design can be assessed relative to a completely randomized 
design or relative to a randomized complete block design. The efficiency relative to a 
completely randomized design is defined by: 
E, = — (28.24a) 


2 
ор 


where o? and o2 are the experimental error variances with a completely randomized design 
and a latin square design, respectively. The efficiency relative to a randomized complete 


1194 -Part Six Specialized Study Designs 


Example 


block design can be measured in two ways, depending on whether the row or the column 
blocking variable is used in the randomized block design: 


оу, 

Е = ж (28.240) 
One 

Ез = ui (28.24) 


ә ә d а S Е 
where oj, and o;. are the experimental error variances with a randomized block design if 
the row blocking variable or the column blocking variable is utilized, respectively. 
E 7 © " 
We can estimate o2, 65, and o. from the results for a latin square design as follows: 


bc 


, MSROW + MSCOL + (r — 1)MSRem 


S = r+ (28.25а) 
,  MSCOL + (r — 1) М5Кет 

Spr = r (28.25b) 
> МКОМ + (r — 0) М5Кет 

Sie = т (28.25с) 


Thus, the estimated measures of efficiency аге: 
_ MSROW + MSCOL + (r — 1)М$Кет 


Ê= i 

' (r 4- DMSRem саа) 
A MSCOL + (r — 1)MSRem 
E, = —— " 

? rMSRem (28.26b) 
R MSROW + (r — 1)М5Кет 
E; = ———————————— 28. 

ч rMSRem (28.260) 


Whenr is small, the efficiency measures may be modified by means of (21.15) to account 
for differences in the number of degrees of freedom associated with the mean squares used 
for estimating the experimental error variances for the two designs being compared. 


For the background music example, we obtain the following efficiency measures from the 
results in Table 28.6: 


p, 205 1193 + 4015.7) _ 


im 22 
6(15.7) 
~ 119.3 + 4(15. 
Ê, = E090) 154 
5(15.7) 
~ 20.5 + 4(15.7) 
Ê, = =1. 
? 5(15.7) ы 


a 
We see that the latin square design was efficient relative to a completely randomized design. 
The latter would have required over twice as many replications for each treatment as the 
latin square design so that the variance for any specified estimated treatment contrast would 
be the same with both designs. Most of this efficiency was gained by the column blocking 
variable (days within week), because the efficiency of the latin square design relative to a 
complete block design with the column blocking variable is poor, being close to 1. Hence, 
little was achieved by also blocking by the row blocking variable (week). 


Chapter 28 Balanced Incomplete Block, Latin Square, and Related Designs 1195 


98.7 Additional Replications with Latin Square Designs 


A latin square design, as noted earlier, provides r replications for each treatment. If power 
and/or estimation considerations indicate that these are too few replications, two basic 
methods are available for increasing the number of replications—teplications within cells 
and additional latin squares. We consider each in turn. 


Replications within Cells 
This method of increasing the replications per treatment is feasible when two or more 
experimental units can be obtained for each cell defined by the row and column blocking 
variables. Consider, for instance, an experiment in which IQ (low, normal, high) and age 
(young, middle, old) are the blocking variables. In this type of situation, it is possible to 
obtain two or more experimental subjects for each cell, and each of the subjects in a cell 
will then receive the treatment assigned to that cell by the latin square employed. 

Letn denote the number of experimental units available for each cell, and let У, кь, denote 
the observation for the mth unit (m = 1,..., п) in the (i, j) cell for which the assigned 
treatment is k. The additive fixed effects model (28.12) is modified for the n replications in 
each cell as follows: 


Хип = p. pi + K; + Tk + Eijkm (28.27) 


where: 


ш... is a constant 

fi, Kj, Tk are constants subject to the restrictions 9 ^p; = 9 уку = т, = 0 

Eijkm are independent N (0, o?) 

DIT.l.rngjeleegnk-l..nm-l..u 

The ANOVA sums of squares and degrees of freedom for model (28.27) can be obtained 


by the rules in Appendix D, remembering that one subscript is redundant. The treatment, 
row, and column sums of squares are, respectively: 


SSTR = my '(Y.,. — Y. (28.282) 
k 
SSROW = m У (Y... — Y. (28.28b) 
SSCOL = mY (Уу. — Y. (28.28c) 
i 
The total sum of squares as usual is: 
SSTO = УУ Y Oijen — ¥...? (28.28d) 
i i m 
while SSRem is obtained as a remainder: 
SSRem — SSTO — SSROW — SSCOL — SSTR (28.28e) 


The degrees of freedom for row, column, and treatment sums of squares are unchanged, 
while those associated with SSRem are increased from (r — 1)(r — 2) to nr? — 3r + 2, an 
increase of (n — 1)r? degrees of freedom. 


1196 PartSix Specialized Study Designs 


TABLE 28.7 
ANOVA Table 
for Latm 
Square Design 
Model (28.27) 
with n 
Replications 
per Cell. 


Example 


Source of Variation SS df MS 
Row blocking variable SSROW r—1 MSROW 
Column blocking variable SSCOL г —1 MSCOL 
Treatments SSTR r —1 MSTR 
Error SSRem nr? —3r+2 MSRem 
Total SSTO nr? —1 


The analysis of variance is shown in Table 28.7. The expected mean squares can be 
obtained by the rules in Appendix D, remembering that one subscript is redundant, or 
from a suitable computer package. The test statistic for testing treatment effects is again 
F* = MSTR/MSRem. 

When п replications are present within a cell for a latin square, it is possible to obtain a 
pure error measure and conduct a test for lack of fit of model (28.27) in the usual manner, 


A state university, while developing a retraining program to teach general computer repair 
skills to persons displaced from their previous occupations, conducted an experiment to 
evaluate the effects of three different incentive methods on achievement during the program. 
The blocking variables were IQ and age of subject. Two replications per cell were utilized. 
Table 28.8a contains the achievement scores for the participants in the experiment, while 
Table 28.8b contains the analysis of variable table obtained from a computer package. 

To test the appropriateness of additive model (28.27), we use the usual test statistic for 
lack of fit: 
= MAE = 82 = 2.05 

MSPE 4.0 
For level of significance a = .05, we need F(.95;2, 9) = 4.26. Since Е" = 2.05 < 4.26, 
we conclude that additive model (28.27) 1s appropriate here. The P-value of the test is .18. 
The comparison of the three incentive methods was then carried out in the usual fashion. 


ж 


Additional Latin Squares 


At times, it'is not possible to obtain additional experimental units within a cell. This is the 
case, for instance, in the background music example of Table 28.4, where only one type 
of music can be played in one day in a bank. When it is not possible to replicate within 
cells, additional replications for each treatment frequently can be obtained by adding one 
or more latin squares to one of the blocking variables. In the background music example 
of Table 28.4, for instance, the experiment could be run for another five weeks. In an 
experiment using plant crews as experimental units and employing as blocking variables 
plant shift (morning, afternoon, evening) and production department (1, 2, 3), additional 
replications can be obtained by running the experiment in other production departments. 

The layout for the background music example of Table 28.4, when run over another five 
weeks, is shown in Table 28.9. The second latin square, and additional ones when required, 
is selected independently of the first. 


Chapter 28 Balanced Incomplete Block, Latin Square, and Related Designs 1197 


TABLE 28.8 (a) Data 
Example of : E rim : 
Latin Square IQ Age (j) 
а with i Young Middle Old 
Replications | (В) (А) (С) 
per Cell— High 19 20 25 
Retraining 16 24 21 
Program | (С) (В) (А) 
Experiment. Normal 24 14 14 
22 15 14 
(A) (C) (B) 
Low 10 12 7 
14 13 4 
(b) Analysis of Variance 
Sound OF ——————— — 
Variation SS df MS 
IQ . 364.3 2 1822 
Age 34.3 2 172 
Treatments 147.0 2 73.5 
Error 52.4 11 4.76 
Lack of fit. 164 2 8.2 
Pure-error 36.0 9 4.0 
Тош! 598.0 17 
TABLE 28.9 Бас, Da 
Two-Latin- ^» — r 
Squares Square ‘Week M T Ww Th F 
Design— 1 D с A B E 
Background 2 C B E A D 
Music Example 1 3 A D B E C 
of Table 28.4. 4 E A С р В 
5 В Е D. C A 
6 E D C A B 
7 B A E D C 
2 8 D C A B E 
9 A E B C D 
10 C B D E A 


1198 Part Six Specialized Study Designs 


Frequently, the additional squares may be viewed as classes of a third blocking variable. 
For instance, in the background music example of Table 28.9, the two latin squares may be 
considered to be two levels of the blocking variable "time period." The first five weeks may 
be viewed as time period |, and the second five weeks as time period 2. As another example, 
in the experiment with plant crews mentioned previously, the production departments for 
the first latin square may be on an hourly rate, while the departments for the second latin 
square may be on incentive pay. Thus, with additional latin squares, one can, in effect, 
introduce a third blocking variable. As a consequence, the variation associated with the 
third blocking variable can be removed from the experimental error variability. In addition, 
the interactions between the third blocking variable and the other variables can be studied. 


28.8 Replications in Repeated Measures Studies 


We noted earlier that a latin square design is highly suitable for repeated measures studies 
when there are r treatments and r subjects. If additional replications are needed, however, 
replications within cells cannot be used since a cell pertains to an individual subject. Instead, 
latin square crossover designs or independent latin squares may be used. 


Latin Square Crossover Designs 


These designs, also called latin square changeover designs, ave often useful when a latin 
square is to be used in a repeated measures study to balance the order positions of treatments, 
yet more subjects are required than called for by a single latin square. With this type of 
design, the subjects are randomly assigned to the different treatment order patterns given by 
a latin square (several latin squares may be used at times). Consider an experiment in which 
treatments A, B, and C are to be administered to each subject, and the three treatment order 
patterns are given by the latin square: 


Order Position 


Pattern 1 2 3 
1 A B С 
2 В С А 
3 C A B 


Suppose that Зи subjects are available for the study. Then н subjects will be assigned at 
random to each of the three order patterns in a latin square crossover design. Note that this 
design is a mixture of repeated measures (within subjects) and latin square (order patterns 
form a latin square). 

Assuming that all effects are additive and fixed except that the effects for subjects are 
random, a relatively simple model for latin square crossover designs can be developed for 
r treatments and н subjects per order pattern. In the following model, p; denotes the effect 
of the ith treatment order pattern, x; denotes the effect of the jth order position, t, denotes 
the effect of the kth treatment, and nmg) denotes the effect of subject m which is nested 


ТАВГЕ 28.10 
ANOVA Table 
for Latin 
Square 
Crossover 
Design Model 
(28.29). 


Chapter 28 Balanced Incomplete Block, Latin Square, and Related Designs 1199 


within the ith treatment order pattern: 
Үт = U- + pi Kj А+ Vk A Паф) + Eijkm (28.29) 
where: 


ш... is a constant 

Pi» Kj, ту are constants subject to the restrictions 9 ^p; =} x; = У) = 0 

Nma are independent N (0, o?) 

Eijkm аге independent N (0, с?) and independent of the Tm) 
D-1,gnj—LagnkbeL..-aorm-—1,.n 

The analysis of variance sums of squares, degrees of freedom, and expected mean squares 


for this model can be obtained by the rules in Appendix D, remembering that one subscript 
is redundant. The formulas for the sums of squares follow the usual pattern: 


SSTO = Y^ у у Og — Y (28.302) 
i f m 


SSP = nr X (ў... Y. (28.30b) 
SSO = nr У...) (28.30c) 
i 
SSTR = nr У (Ё.— Ү..? (28.30d) 
k 
555 =r X > Ën — 0...) (28.30e) 
SSRem — 670 _ SSP — SSO — SSTR — SSS (28.30f) 


Here, SSPis the (treatment) pattern sum of squares, SSO is the order position sum of squares, 
SSS is the subject sum of squares, and the other sums of squares have their usual meanings. 
Table 28.10 contains the ANOVA table. 


Source of Variation ` 55 df MS E{MS} 
Я 2 2 Уо? 
Patterns (Р) SSP r—1 MSP с + ror + реше 
к? 
Order positions (О) 550 г 1 MSO c? 4 nr = 1 
2 Уу 
Treatments (TR) SSTR r—1 MSTR c? + mT 
Subjects (5) SSS r(n— 1) MSS c? + ro? 
(withiri patterns) - 
Error. SSRem (г. – 1у(пғ — 2) MSRem · о? 


Total SSTO nr? —1 


1200 Part Six Specialized Study Designs 


TABLE 28.11 
Latin Square 
Crossover 
Design—Apple 
Sales Example. 


Example 


@ Data (coded) - 


Pattern “Two Week Perlod ( D 
i Store 1 2 3 
à m=1 9 (B) 12(Cy 15 (А) 
ma=2 4 (В). 12 (C) 9 (А) 
5 m=1 12(A 1448) 3(С) 
"s m=2 13.(A): 14€B) . 3(C) 
Я т= 1 7(Су  18(À 6B) 
m=2 5 (C) 20 (A) 4 (B) 
: .. , (b) Analysis of Variance: 
Source of Variation ку df MS 
Patterns .33 2 17 
Order positions 233.33 2 116.67 
Displays 2° 189;00 2 94.50 
Stores 21.00 3 7.00 
(within patterns) | 
Error 20.33 87 2.54 
Total 464.0 17 


Table 28.11a contains data for a study of the effects of three different displays on the sale 
of apples, using a latin square crossover design. Six stores were used, with two assigned at 
random to each of the three treatment order patterns shown. Each display was kept for two 
weeks, and the observed variable was sales per 100 customers. Table 28.11b contains the 
analysis of variance. The sums of squares were obtained from a computer run. 

То test for treatment (display) effects, we use: 


„  MSTR 94.5 

Мата 
For a = .05, we require F(.95; 2, 8) = 4.46. Since F* = 37.2 > 4.46, we conclude that 
there are differential sales effects for the three displays. The P-value of the test is 0+. 
Tests for pattern effects, order position effects, and store effects were also carried out. They 
indicated that order position effects were present, but no pattern or store effects. Order 
position effects here are associated with the three time periods in which the displays were 
studied, and may reflect seasonal effects as well as the results of special events, such as 
unusually hot weather in one period. The comparison of the three treatment effects was then 
carried out in the usual fashion. с 


Use of Independent Latin Squares 


If the order position effects are not approximately constant for all subjects (stores, etc.), à 
crossover design is not fully effective. It may then be preferable to place the subjects into 
homogeneous groups with respect to the order position effects and use independent latin 


Chapter 28 Balanced Incomplete Block, Latin Square, and Related Designs 1201 


squares for each group. Suppose that four treatments are to be administered to eight subjects 
each, four males and four females, and that the experimenter expects the fatigue effect to 
be stronger for females than for males. The use of two independent latin squares, one for 
male subjects and the other for female subjects, may then be advisable. 


Carryover Effects 


TABLE 28.12 
Illustration of a 
Latin Square 
Double 
Crossover 
Design. 


If carryover effects from one treatment to another are anticipated, that is, if not only the 
order position but also the preceding treatment has an effect, these carryover effects may 
be balanced out by choosing a latin square in which every treatment follows every other 
treatment an equal number of times. For r — 4, an example of such a latin square is: 


Period 
Subject 1 2 3 4 
1 A B D C 
2 B C A D 
3 C D B A 
4 D A C B 


Note that treatment A follows each of the other treatments once, and similarly for the other 
treatments. This design is appropriate when the carryover effects do not persist for more 
than one period. 

Whenr is odd, the sequence balance can be obtained by using a pair of latin squares with 
the property that the treatment sequences in one square are reversed in the other square. 
Indeed, even when r is even, it is usually desirable to use a pair of such squares so that the 
degrees of freedom associated with MSRem are reasonably large. Such a design is sometimes 
called a latin square double crossover design. This type of design retains the advantages of 
employing two blocking variables in a latin square, while enabling the experimenter also 
to balance and measure the carryover effects. 

For the earlier apple display illustration in which three displays were studied in six stores, 
the two latin squares might be as shown in Table 28.12. The stores should first be placed 
into two homogeneous groups and these should then be assigned to the two latin squares. 


Two-Week Period 


Square Store 1 2 3 
1 A B: C 

1 2 B- С А 

3 C A B 

4 A C B 

2 5 B A C 

6 C B А 


1202 PartSix Specialized Study Designs 


Cited 28.1. Cochran, W. G., and С. M. Cox. Experimental Designs. 2nd ed. New York: John Wiley & Sons, 
References 1957. | | | 
28.2. Cook, В. D., and C. J. Nachtsheim. “Computer-Aided Blocking of Factorial and Response 
Surface Designs," Technometrics 31 (1989), pp. 339—346. 
28.3. Dean, A., and D. Voss. Design and Analysis of Experiments. New York: Springer-Verlag, 1999, 
28.4. Snedecor, G. W., and W. G. Cochran. Statistical Methods. 8th ed. Ames, Iowa: The Iowa State 
University Press, 1989. 


Problems 28.1. Discuss the advantages and disadvantages of balanced incomplete block designs in compari- 
son to randomized complete block designs. 


28.2. What is meant by balance in a balanced incomplete block design? What are the advantages of 
balance? Under what circumstances might the use of an unbalanced incomplete block design 
be justified? 

28.3. Construct a balanced incomplete block design for three treatments in blocks of size two. How 
many blocks n; are required? What are п and n, for your design? 

28.4. Construct a balanced incomplete block design for seven treatments in blocks of size five. How 
many blocks n; are required? What are n and n, for your design? 

28.5. Construct a balanced incomplete block design for eight treatments in blocks of size three. 
How many blocks n, are required? What are n and n, for your design? 

28.6. Detergent effectiveness. A chemical engineer wished to evaluate the effectiveness of nine 
alternative formulations of a dishwashing detergent in terms of the extent to which each 
would maintain foam or suds while in use. Three sinks were available, and three people were 
instructed to use the sinks to wash plates at a constant rate. Each block consisted of three 
experimental units, where the experimental unit was a sink with a fixed amount of clean water 
and a fixed amount of soil added. Three detergent formulations were randomly assigned to 
the three sinks in each block. The response Y was foam duration, which was measured by the 
number of plates washed before the suds disappeared. BIBD number 18 from Table 28.1 was 
utilized for this experiment. Data for the randomized BIBD follow: 


Treatments Responses 
Block Sink 1 Sink 2 Sink 3 Sink 1 Sink 2 Sink 3 

à 1 3 8 4 13 20 7 
2 4 9 2 6 29 17 

3 3 6 9 15 23 31 

4 9 5 1 31 26 20 

5 2 7 6 16 21 23 

6 6 5 4 23 26 6 

7 9 8 7 28 19 21 

8 7 1 4 20 20 7 

9 6 8 1 a 24 19 20 
10 5 8 2 26 19 17 
11 5 3 7 24 14 19 
12 3 2 1 11 17 19 


John, P. W. M. "An Application of a Balanced Incomplete Block Design," Technometrics 3 (1961), рр. 51-54. 


Obtain the residuals for balanced incomplete block design model (28.2) and plot them against 
the fitted values. Also prepare a normal probability plot of the residuals and calculate the 


28.7. 


*28.8. 


*28.9. 


Chapter 28 Balanced Incomplete Block, Latin Square, and Related Designs 1203 


coefficient of correlation between the ordered residuals and their expected values under nor- 
mality. Summarize your findings about the appropriateness of model (28.2) here. 

Refer to Detergent Effectiveness Problem 28.6. Assume that balanced incomplete block 
design model (28.2) is appropriate. 


а. Obtain the least squares estimates of the treatment means and plot them against treatment 
number in the form of Figure 28.4. Does your plot suggest the presence of treatment 
effects? 

b. Test whether or not treatment affects foam duration; use œ = .05. State the alternatives, 
decision rule, and conclusion. What is the P-value of the test? 

c. Test whether or not block effects are present; use œ = .05. State the alternatives, decision 
rule, and conclusion. What is the P-value of the test? 


d. Give a 95 percent confidence interval for the fifth treatment mean. 


e. Analyze the nature of the treatment effects by making all pairwise comparisons among the 
treatment means. Use the Tukey procedure and a 90 percent family confidence coefficient. 
Summarize your findings using a line plot of the least squares treatment means. 


Automobile tire wear. An automotive engineer wished to evaluate the effects of four rubber 
compounds on the life of automobile tires. The manufacturing process permitted the use of up 
to three different compounds in a given tire. To do this, the tire is divided into three sections, 
and a different compound is used in each section. Because each segment of a tire would be 
subject to nearly identical road conditions, the investigator decided to use tires as blocks, 
with three of the four treatments (compounds) being applied to the three experimental units 
(tire segments) in each block. Four tires were tested. The response Y is a coded measure 
of wear. Design 2 from Table 28.1 was utilized; the experimental layout and response data 
follow: 


Compound 
ше. МэВ. Сг» ир 
1 238 238 279 
2 196 213 308 
3 254 334 367 
4 312 421 412 


Davies, O. L., ed. The Design and Analysis of Industriai 
Experiments, London: Oliver and Boyd (1961) 


Obtain the residuals for balanced incomplete block design model (28.2) and plot them against 

the fitted values. Also prepare a normal probability plot of the residuals and calculate the coef- 

ficient of correlation between the ordered residuals and their expected values under normality. 

Summarize your findings about the appropriateness of model (28.2) here. 

Refer to Automobile tire wear Problem 28.8. Assume that balanced incomplete block design 

model (28.2) is appropriate. 

a. Obtain the least squares estimates of the treatment means and plot them against treatment 
number in the form of Figure 28.4. Does your plot suggest the presence of treatment effects? 

b. Test whether or not the type of compound affects tire wear, use œ = .05. State the alterna- 
tives, decision rule, and conclusion. What is the P-value of the test? 

c. Test whether or not block effects are present; use œ = .05. State the alternatives, decision 
rule, and conclusion. What is the P-value of the test? 


AA Muri I 


Miren as cm 


1204 Part Six Specialized Study Designs 


*28.10. 


28.11. 


28.12. 


28.13. 


*28.14. 


*28 15 


d. Give a 95 percent confidence interval for the mean wear for compound A, 

e. Analyze the nature of the treatment effects by making all pairwise comparisons among the 
treatment means, Use the Tukey procedure and a 95 percent family confidence coefficient 
Summarize your findings using a line plot of the least squares treatment means, f 


Suppose that Tukey’s method for all pairwise comparisons will be made using balanced 
incomplete block design number 2 in Table 28.1. Assume that o? will be no larger than 2.0 
and the widths of the simultaneous 95 percent confidence intervals are not to exceed 3.0, 
Determine n, the number of replicates, and ль, the number of blocks, necessary to satisfy 
these requirements. How many repeats of design number 2 are required? 

Suppose that Tukey's method for all pairwise comparisons will be made using balanced 
incomplete block design number 5 in Table 28.1. Assume that o? will be no larger than 15 
and the widths of the simultaneous 90 percent confidence intervals are not to exceed 2,5. 
Determine л, the number of replicates, and п, the number of blocks, necessary to satisfy 
these requirements. How many repeats of design number 5 are required? 

A behavioral scientist explained why latin square designs are used so frequently: “Many times 
in behavioral science, we require the use of repeated measures designs because variability 
between human subjects is so great. Since an order effect may be present in this situation, we 
employ latin square designs to eliminate any bias due to order effects.’ Comment. 

a. Using random permutations, select randomly a 3 by 3 latin square. Show all steps. 

b. Using random permutations, select randomly a 6 by 6 latin square. Show all steps. 
Hardware sales. A manufacturer conducted a small pilot study of the effect of the price of 
one of its products on sales of this product in hardware stores. Since it might be confusing 
to customers if prices were switched repeatedly within a store, only one price was used for 
any one store during the six-month study period. Sixteen stores were employed in the study. 
To reduce experimental error variability, stores were chosen so that there would be one store 
for each sales volume-geographic location class. The four price levels (A: $1.79; B: $1.69; 
C: $1.59; D: $1.49) were assigned to the stores according to the latin square design shown 
below. Data on sales during the six-month period (in thousand dollars) follow. 


Geographic Location Class (j) 


Sales Volume Class 


i Northeast Northwest Southeast Southwest 
1 (smallest) 1.2 (В) 1.5 (С) 1.0 (А) 1.7 (0) 
2 1.4 (А) 1.9 (D) 1.6 (B) 1.5 (C) 
3 2.8 (C) 2.1 (B) 2.7 (D) 2.0 (A) 
4 (largest) 3.4 (D) 2.5 (A) 2.9 (C) 2.7 (B) 


Obtain the residuals for latin square model (28.12) and plot them against the fitted values. Also 
prepare a normal probability plot of the resitluals and calculate the coefficient of correlation 
between the ordered residuals and their expected values under normality. Summarize your 
findings about the appropriateness of model (28.12) here. 

Refer to Hardware sales Problem 28.14. Assume that latin square model (28.12) is 
appropriate. 


a. Prepare a main effects plot of the estimated treatment means. What does the plot suggest 
about the effects of the four price levels on sales? 

b. Test whether or not price level affects mean sales; use о = .05. State the alternatives, 
decision rule, and conclusion. What is the P-value of the test? 


*28.16. 


28.17. 


28.18. 


28.19. 


*28.20. 


Chapter 28 Balanced Incomplete Block, Latin Square, and Related Designs 1205 


c. Analyze the nature of the price effect on sales by making all pairwise comparisons among 
the treatment means. Use the Tukey procedure and a 90 percent family confidence coeffi- 
cient. Summarize your findings. 

d. Does there appear to be a linear relationship between price level and mean sales? Could 
you formally test for linearity? Explain. 

Refer to Hardware sales Problems 28.14 and 28.15. 

a. Calculate the three estimated efficiency measures in (28.26). 

b. Would arandomized complete block design have been adequate here? If so, which blocking 
variable would have been best? 

Summary reports. A management information systems consultant conducted a small-scale 
study of five different daily summary reports (A: greatest amount of detail; В; С; Р; E: least 
amount of detail). Five sales executives were used in the study. Each was given one type of 
daily report for a month and then was asked to rate its helpfulness on a 25-point scale (0: no 
help; 25: extremely helpful), Over a five-month period, each executive received each type 
of report for one month according to the latin square design shown below. The helpfulness 
ratings follow. 


Executive ы A) ) ee у. 
i March April May June July 
Harrison 21 (D) 8 (А) 17 (C) 9 (B) 16 (Е) 
Smith 5 (А) 10(Е) 3 (B) 12 (C) 15 (0) 
Carmichael 20 (C) 10 (B) 15 (Е) 22 (0) 12 (А) 
Loeb 4 (B) 17 (D) З (А) 9 (Е) 10 (C) 
Munch 17 (E) 16 (C) 20 (D) 7 (A) 11 (B) 


Obtain the residuals for latin square model (28.12) and plot them against the fitted values. Also 

prepare a normal probability plot of the residuals and calculate the coefficient of correlation 

between the ordered residuals and their expected values under normality. Summarize your 
findings about the appropriateness of model (28.12) here. 

Refer to Summary reports Problem 28.17. Assume that latin square model (28.12) is 

appropriate. 

а. Prepare a main effects plot of the estimated treatment means. What does the plot suggest 
about the effects of the five types of reports? 

b. Test whether or not the five types of reports differ in mean helpfulness; use significance 
level œ = .01. State the alternatives, decision rule, and conclusion. What is the P-value of 
the test? 

c. Analyze the effectiveness of the five types of reports by making all pairwise comparisons 
among the treatment means. Use the Tukey procedure and a 95 percent family confidence 
coefficient. Summarize your findings. 

Refer to Summary reports Problems 28.17 and 28.18. 

a. Calculate the three estimated efficiency measures in (28.26). 

b. How effective was the use of the latin square design here? 

Refer to Hardware sales Problems 28.14 and 28.15. Assume that o = .15. What is the power 

of the test for treatment effects in Problem 28.15b if т = —.4, т = 0, тщ = .1, and 

T4 = .3? 


“+ 


aem 


S vem 


—— (E шаша 


1206 Part Six Specialized Study Designs 


28.21. 


28.22. 


28.23. 


*28.24. 


Refer to Summary reports Problems 28.17 and 28.18. Assume that o = 1.4, What is the 
power of the test for treatment effects in Problem 28.18b if т = —2, т = —], h=0 
та = 1.5, 15 = 1.5? | 
Drugs interaction. A pilot study was undertaken on the interaction effects of two drugs to 
stimulate growth in girls who are of short stature because of a particular syndrome. Each drug 
was known to be modestly effective singly, but the combination of the two drugs had never 
been investigated. Blocking by both subject and time period was desired whereby repeated 
measures for different treatments applied to the same subject are obtained. A 4 by 4 latin square 
design, shown below, was utilized for four subjects, four time periods, and four treatments, 
The four time periods consisted of one month each, separated by an intervening month during 
which no treatment was given. The four treatments were A: no treatment (placebo); В: drug 
X alone; C: drug Y alone; D: both drugs X and Y. The response variable was the difference 
in the growth rates (in centimeters per month) during the treatment period and the base period 
before the experiment began. The results of the study follow. 


Subject mua et 
i 1 2 3 4 
1 .02 (A) .15 (B) .45 (D) .18 (C) 
2 .27 (B) 24 (C) —.01 (A) -58 (D) 
3 .11 (C) .35 (D) 14 (B) —.03 (А) 
4 .48 (D) .04 (А) .18 (C) .22 (B) 


Obtain the residuals in (28.16) for latin square model (28.12) and plot them against the fitted 

values. Also prepare a normal probability plot of the residuals and calculate the coefficient of 

correlation between the ordered residuals and their expected values under normality. Summa- 

rize your findings. 

Refer to Drugs interaction Problem 28.22. Assume that an appropriate model is latin square 

model (28.12), modified so that subjects have random effects and a factorial structure for the 

treatments is incorporated (factor A: drug X; factor B: drug Y). 

a. State the model to be employed. 

b. Test for interaction effects between the two drugs; use œ = .10. State the alternatives, 
decision rule, and conclusion. What is the P-value of the test? 

c. Estimate the interaction contrast: 


-2 + Pee Bu Be 
L= (= : > : na) (us 2 2 = J = fog — Map Beck BH 


using a 90 percent confidence interval. Interpret your result. 


Refer to Hardware sales Problem 28.14. 

a. Setup the regression model equivalent to latin square model (28.12) using 1, — 1,0 indicator 
variables, 

b. Test by means of the regression approach whether or not price level affects mean sales; 
use œ = .05. State the alternatives, decision rule, and conclusion. 

с. Obtain a 95 percent confidence interval by the regression approach for L = t3 — 14. 
Interpret your interval estimate. 

d. Suppose that observation Yos; = 1.6 were missing. 
i. Use the regression approach to test whether price level affects mean sales; control the 

о risk at .05. State the alternatives, decision rule, and conclusion. 


28.25. 


28.26. 


28.27. 


28.28. 


Chapter 28 Balanced Incomplete Block, Latin Square. and Related Designs 1207 


ü. Use the regression approach to estimate L = c, — t; by means of a 95 percent confidence 
interval. 


Refer to Summary reports Problem 28.17. Suppose that observations Ү 4 = 21 and Y4s4 = 
10 were missing. 


a. Use the regression approach to test whether the five types of reports differ in mean 
effectiveness; employ significance level a = .01. State the alternatives, decision rule, 
and conclusion. 

b. Use the regression approach to estimate L = т, — тү by means of a 99 percent confidence 
interval. 


TV commercials. A study was undertaken to determine whether the volume of sound of a tele- 
vision commercial affects recall and whether this effect varies by product. Thirty-two subjects 
were chosen, two each for 16 groups defined according to age (class 1: youngest; 2; 3; 4: old- 
est) and amount of education (class 1: lowest education level; 2; 3; 4: highest education level). 
Each subject was exposed to one of four television commercial showings (А: high volume, 
product X; B: low volume, product X; C: high volume, product Y; D: low volume, product Y) 
according to the latin square design shown below. Two different commercials were in- 
volved, one for each product. During the following week, the subjects were asked to mention 
everything they could remember about the advertisement. Scores were based on the number 
of learning points mentioned, suitably standardized. The results follow. 


Age Class i: 


Education Level 


jal: 56 60 (A) 
j22: 72 67 (D) 
j23: 63 67 (B) 
ј=4: 64 66 (С) 


Obtain the residuals for latin square model (28.27) and plot them against the fitted values. Also 
prepare a normal probability plot of the residuals and calculate the coefficient of correlation 
between the ordered residuals and their expected values under normality. Summarize your 
findings about the appropriateness of the model utilized here. 

Refer to TV commercials Problem 28.26. Assume that latin square model (28.27), modified 
to allow for factorial treatments (factor A: volume; factor B: product), is appropriate. 


a. State the model to be employed. 

b. Test for volume-product interaction effects; use œ = .01. State the alternatives, decision 
rule, and conclusion. What is the P-value of the test? 

c. Test for volume main effects and product main effects. For each test, use œ = .01 and state 
the alternatives, decision rule, and conclusion. What is the P-value of each test? 

d. Tostudy the nature of the volume and product main effects, estimate the difference between 
the two factor level means for each factor. Use the Bonferroni procedure and a 95 percent 
family confidence coefficient. State your findings. 


Recall decay. In an experiment to study recall decay with three different questionnaires 
(A, B, C), nine subjects were questioned at three different times three months apart about the 
number of trips to a shopping center during the preceding three months. Each time a different 
questionnaire was used. The latin square design shown on the following page used to determine 
the questionnaire order for each subject, with three subjects assigned randomly to each of the 
three treatment order patterns. The data on number of shopping trips reported follow. 


1208 PartSix Specialized Study Designs 


28.29. 


Time Period ( j) 


Pattern 

i Subject 1 2 3 
т= 1 40 (C) 18 (A) 30 (B) 

1 т=2 35(С) 25 (A) 37 (B) 
m=3 31 (C) 22 (А) 28 (B) 
т= 1 10 (8) 43 (C) 33 (A) 

2 т=2 18(В) 49 (C) 37 (А) 
т=3З 15 (B) 48 (C) 29 (A) 
m=1 7 (A) 19 (B) 59 (C) 

3 m=2 11 (A) 24 (B) 51 (C) 
т=3З 19 (А) 21 (В) 62 (С) 


Obtain the residuals for latin square crossover model (28.29) and plot them against the fitted 
values. Also prepare a normal probability plot of the residuals and calculate the coefficient of 
correlation between the ordered residuals and their expected values under normality. Summa- 
rize your findings about the appropriateness of model (28.29) here. 


Refer to Recall decay Problem 28.28. Assume that latin square crossover model (28.29) is 


appropriate. 


а. Test for the presence of treatment order pattern, time period, and questionnaire effects. For 
each test, use level of significance œ = .05 and state the alternatives, decision rule, and 


conclusion. What is the P-value of each test? 


b. Analyze the questionnaire main effects by estimating all pairwise comparisons of treat- 
ment means. Use the Tukey procedure and a 90 percent family confidence coefficient. 


Summarize your findings. 


Chapter 


Exploratory Experiments: 
Two-Level Factorial and 
Fractional Factorial Designs 


Up to this point, much of our discussion of the design of experiments has focused on 
the planning of confirmatory experiments. Generally, confirmatory experiments employ a 
relatively small number of explanatory factors. The factors under investigation usually are 
suggested by existing theory or by previous experimental findings. Exploratory experimental 
studies are typically encountered during the early stages of a new research study, when little 
is known about the set of important or active explanatory factors. At this stage of the 
investigation, the experimenter often needs to consider a large number of factors in order 
to identify the factors that are the most important. One means of including a large number 
of factors in an experiment while keeping the total number of treatment combinations at a 
manageable Jevel is to study each factor at only two levels. For example, in a four-factor 
experiment, one replication of a two-level factorial experiment consists of just 2^ = 16 
treatment trials. In contrast, if each factor were studied at three levels, a single replication 
would require 3^ = 81 treatment trials—over five times that required by the two-level 
experiment. 

Even when only two levels are employed for each factor, the size of the experiment can 
still become prohibitively large when a large number of factors are to be studied. In such 
cases, a carefully selected subset, or fraction, of the treatments can be used with little or 
no loss of information about the main effects and key low-order interactions. Fractional 
factorial designs permit the study of a large number of factors with relatively few 
experimental trials. 

Another means of keeping the number of trials small in exploratory experiments is to 
use a single replication or to employ replications for only one or a few of the treatments. 

In this chapter, we first discuss the use of two-level factorial experiments and then con- 
sider two-level experiments with only one replication. We then take up fractional factorial 
designs and their analysis, including designs for screening a large number of factors. In 
Section 29.5 we discuss briefly the use of blocking in two-level experiments. We conclude 
the chapter by introducing robust product and process design experiments and illustrate 
their use with a case study from the automotive industry. Unless explicitly stated otherwise, 

1209 


1210 PartSix Specialized Siudy Desigus 


we assume throughout the chapter that all treatment sample sizes are equal and all factor 
effects are fixed. 


20.1 ‘Two-Level Full Factorial Experiments 


Design of Two-Level Studies 


Notation 


Experimental studies involving k factors, each at two levels, are often referred to as 2* facto- 
rial studies. The choice of the two levels for each factor in a two-level factorial experiment at 
times is automatic. Some factors exist naturally at two levels. For instance, in a marketing 
research study of the effects of including or excluding special features, such as antilock 
brakes and automatic headlight dimmers tn an automobile, the factors automatically have 
two levels. At other times, a deliberate choice of the two levels must be made. For instance, 
in a study of a rubber extrusion process, curing time was one of the factors of interest. 
Economic and engineering considerations dictated that curing time be at least 30 minutes 
and not longer than 45 minutes. The two levels selected here were 30 and 45 minutes to 
provide information at the limits of the range of the factor. 

An example of a two-level factorial study involving three factors with three replications 
is the stress test study in Table 24.4. There, the gender levels were male and female, and 
subjects were classified as having low or high body fat and being light or heavy smokers. 

Since two-level factorial studies are a special case of the factorial studies discussed in 
earlier chapters, we already know how to analyze such studies. For our purposes here, 
however, we need to modify our earlier notation because it becomes awkward when there 
are many factors. Also, we shall see that some simplifications arise in the calculational 
formulas when all factors have two levels. 


Consider our usual formulation of the regression version of a three-factor ANOVA model 
for a balanced study where each factor has two levels: 


Yigg = Bee + 08 Xi + BiXijkm2 + УХ кшз 
+ (QB) Xi jet Xijkm2 + (QY UX ijt Xijkm3 (29.1) 
+ (By)u Хуки? Ж кшз + (COBY) 111 Xijkmi Xijtu2 Xijuna + &ijkm 


where Х|, Хэ, Хз take on the values | and — І for the two factor levels. Even though in a 
two-level factorial study there is only one main effect term for each factor, one two-factor 
interaction for each pair of factors, and so on, it is evident that with more factors the notation 
used in model (29.1) will become very cumbersome. 

We therefore will change the notation as follows, using the conventions for polynomial 
regression in Section 8.1: 


1. The main effects will be represented by £i, 62, etc. The overall constant will be repre- 
sented by Bo. 

2. The two-factor interaction effects will be represented by B12, Вз, etc. 

3. Three-factor and higher-order interaction effects will be represented correspondingly; 
for instance, by Воз and 61234- 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1211 


4. The index i will be used to denote the observation number, running from 1 to пу. 

5. Cross-product terms will be represented by a single X. For instance, X, X; will be 
represented by Х 2; X, X; Хз will be represented by Х эз; and so on. The value of X, X; 
for the ith observation will be represented by Ху. 

6. When the factor is quantitative, the low level will be the first level and will be coded 
—1, and the high level will be the second level and will be coded 1. This coding for 
quantitative factors is equivalent to standardizing the levels by subtracting the mean and 
dividing by half of the range. For a qualitative factor, the first level correspondingly will 
be coded —1 and the second level coded 1. Note that the —1, 1 coding here is the opposite 
of the convention followed earlier. 


With these conventions, model (29.1) is now stated as follows, using Xo = 1 as the 
dummy variable associated with fo: 


Y; = BoXio + BiXiy + B2Xio + BaXis + Bio Хах + Bis Xiis + Bo3Xi23 + Bios X3 + €i 
(29.2) 


where: 


є; are independent N(0, o?) 


x = —1 ifcase from first level of factor 1 
is 1 ifcase from second level of factor 1 


X= —] if case from first level of factor 2 
inn 1 ifcasefrom second level of factor 2 


Xi —1 if case from first level of factor 3 
SR 1 if case from second level of factor З 


Во in model (29.2) corresponds to д... in model (29.1). Because the codes —1, 1 are now 
reversed from our earlier convention, В; corresponds to —о = o». Similarly, f; corresponds 
to В: = fo, and Вз corresponds to — y, = yz. The parameter fı corresponds to 
(a B)u = (@B)22 because of two reversals in the signs of the indicator variables. 

For k factors, model (29.2) is extended as follows: 


Y; = foXio + В. Ха +: + BeXin + Po Xip t cc Во. Хао. + €i (29.2a) 
where: 


X,= —1 if case i from first level of factor j 
un 1 if case i from second level of factor j 


and Xo and e; are defined as in (29.2). 

It is often helpful to list the treatments in a two-level factorial experiment in a standard 
order. We shall use as the standard order a listing of the treatments such that the level 
of factor 1, Х|, changes most frequently, the level of factor 2, X2, changes with second 
greatest frequency, and so on. In a three-factor study, for instance, the standard order of the 


1212 Part Six Specialized Study Designs 


treatments is obtained by listing factor levels in the following sequence: 


Treatment Xi X2 Хз 
1 1 1 
2 1 —1 —1 
3 —1 —1 
4 1 —1 
5 — —1 1 
6 — 1 
7 — 1 
8 1 1 1 


Note that treatment | consists of all three factors at their first levels, treatment 2 consists of 
factor A at its second level and factors B and C at their first levels, and so on. The matrix 
consisting of the Х|, X2, and Хз columns is called the design matrix because it identifies 
the treatments in the experimental study. 

A standard order for treatments is simply a convention for listing treatments in two-level 
factorial experiments; the actual ordering of the treatment trials in the experiment and the 
assignment of the treatments to experimental units are determined by randomization. 


Estimation of Factor Effects 
When a balanced factorial experiment is carried out at two levels for each factor and a —1, 1 
coding is employed, the X'X matrix is greatly simplified. Consider a two-factor study with 
п = 1 replication. The X matrix, using the coding in (29.2), is as follows (treatments are in 
standard order): 

Xo Xi X% Хр 

1 -1 -1 1 

1 і =1 -1 

1 -1 1 -1 

1 1 1 l 


‚ The simplifications in the XX matrix arise because: 


1. Any two columns of the X matrix are orthogonal; that is, XX, = 0. In our simple 
example, for instance: 
—1 
Х\Х,=[—1 1 -1 1] x =0 
^L 1 
2. The sum of squares of the elements in each column, X; X,, is always пт. In our simple 
example, for instance: 


XX =[-1 | -1 d 


TABLE 29.1 Y and X Data Matrices їп Standard Order—Stress Test Example of Table 24.4. 


EU 


Ё 


Yun 
Yii2 
Yuis 
You 
Yon 
Yos 
Үт 


Үэ 
| Yoon 
Y2223 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1213 


Consequently, the elements on the main diagonal of the X’X matrix are all пт and the 
elements off the main diagonal are all zero so that X’X is a diagonal matrix: 


XX = пу (29.3) 


The inverse of X’X therefore is a diagonal matrix with the diagonal elements being the 
reciprocals of the elements in (29.3): 


XX)! = 11 (29.4) 
nT 


The least squares and maximun likelihood estimators in (6.25) therefore become simple 
in form: 


1 
b —-(XX)'X'Y--—xv (29.5) 
nT 
Letting X, denote the column vector containing the qth column of the X matrix, the estimated 
regression coefficient b, therefore is: 


b,- —XY (29.6) 


Since each column vector X, contains only 1s and — 1s, the estimated coefficients b} are 
very simple linear combinations of the observations. 

We illustrate this in Table 29.1, which contains the Y vector and the X matrix for the 
stress test example of Table 24.4, with the observations listed in standard order. The Y 
observations are shown both in the earlier notation and the current notation to facilitate 
recognition of the treatments involved. (Note that the coding of the factor levels in the X 
matrix is the opposite of that in Table 24.7 and that the ordering of the observations also 


Xy X; Xe Xi X2 Хз Хз Kiz 


Yı 1-1 -1 -1 1 1 1 A 
Y? 1.-1-1-1 1 [d 1 A 
Үз. |1 -1-1-1 1 141 1 A 
Y4 [1 Aa -1 a — 21 ; 
Y; 1 1-1 -1 -1 -1 f 
Ye | — x-|1 1-1-1-1-t* 1 
Y 1. -1 1-1-1 1 -1 
Y22 1 1 T 1 å 1 1 
Ys 1 1 1 {i 1 1 1 
Vou 1 1 a P 1i SES ^ 


1214 PartSix Specialized Studv Designs 


differs.) We see that the estimated coefficient руз, for instance, is simply: 


24.1 
29.2 
bs І 24.6 

Ба = 2 iX ee Ж sec 2H 20.0 = .754 (29.7) 
6.1 


The variance-covariance matrix of b in (6.46) is also greatly simplified: 
о? 
о) = о (XX) = —1 (29.8) 
T 


Note from this matrix that the estimated regression coefficients here are uncorrelated and 
have constant variance: 


M 


» с 
ob, = — (29.9) 


T 


The estimated variance-covariance matrix in (6.48) becomes: 


MSE 


Hr 


s’{b} = 


1 (29.10) 


so that the estimated variance of b, is simply: 


MSE 


Ит 


Pr (29.11) 


Comments 


l. Some texts and software packages define the effect of a factor as an observed difference between 
responses when that factor changes from its first level to its second level. For example, the estimated 
main effect of factor | (factor A) is defined as: 


` 


m (m response for A Е (ня response for all ) (29.12) 
^ X trials in which X; = 1 trials in which X, = —1 d 
А is an estimate of оз — o, = 205; recall that o; = —o» when the factors are at two levels. Conse- 
quently, the relation between А and our estimate b, (which now estimates œ = —g1) IS: 
A = 2b (29.13) 


E . 3 . m ^ " . 
The relations for the other main effects and interaction effects are similar. 


2. The —1, 1 coding used for the predictor variables in (29.2) is sometimes referred to as 

" . А x - t 
an orthogonal coding because it leads for balanced two-level factorial designs to a diagonal X'X 
matrix. [| 


Inferences about Factor Effects 


As noted earlier, a main objective in two-level exploratory studies is usually the identification 
of active effects. An effect is considered active if the corresponding factor effect coefficient 
is nonzero. Since all estimated factor effects have the same variance for balanced studies, as 


Ехатр!е 


FIGURE 29.1 
MINITAB 
FFactorial 
Output—Stress 
Test Example 
of Table 24.4. 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1215 


noted in (29.9), a normal probability plot can be made of all estimated main and interaction 
effects to identify those that appear to be active. We shall illustrate this plot shortly. 

Formal tests for a regression coefficient, with the alternatives Ho: By = 0, Ha: B, # 0, 
are carried out in the usual manner, based on either the ¢* statistic in (7.25) or the F* 
statistic in (7.24). In many instances, the testing procedure will be used for each of the 
factor effects. The family level of significance then can be controlled at a by either the 
Bonferroni inequality (4.4) or the Kimball inequality (19.53). 


Figure 29.1 contains the MINITAB FFactorial output for the stress test example of 
Table 24.4. In this study, the effects of gender of subject (factor A), body fat of subject 
(factor B), and smoking history of subject (factor C) on exercise tolerance were studied. 
The MINITAB ANOVA output is based on the coding of the factor levels in Table 29.1. The 
estimated factor effect coefficients b, are shown in the column marked “Coef.” The column 
marked “Effect” contains the alternative definition of effects in (29.12). Notice that when 
each entry in this column is divided by 2, as shown in (29.13), the estimated coefficients b, 
are obtained. Also note that the estimated standard deviations in the column labeled “Std 
Сое?” are all the same, as required by (29.11): 


MSEN'? 9.335 V? 
= = (2222 |) 263 
sia (э) Cz) 907 


Using a significance level of .015 for each of the seven tests on the estimated factor effect 
coefficients so as to assure a family level of significance of .10 by the Kimball inequality, 
we see from the P-values in Figure 29.1 that the set of active factor effects consists of the 
gender, body fat, and smoking main effects, and the body fat-smoking interaction. 


Estimated Effects and Coefficients for TOLERANCE 


Term Effect Coef Std Coef t-value P 
Constant 16.271 0.6237 26.09 0.000 
GENDER —5.425 —2.713 0.6237 —4.35 0.000 
BODYFAT —6.358 —3.179 0.6237 —5.10 0.000 
SMOKING —3.425 —1.713 0.6237 —2.75 0.014 
GENDER*BODYFAT 1.508 0.754 0.6237 1.21 0.244 
GENDER*SMOKING —1.358 —0.679 0.6237 —1.09 0.292 
BODYFAT*SMOKING 3.475 1.737 0.6237 2.79 0.013 
GENDER*BODYFAT*SMOKING —0.558 —0.279 0.6237 —0.45 0.660 


Analysis of Variance for TOLERANCE 


Source DF Seq SS Adj SS Adj MS F P 
Main Effects 3 489.538 489.538 163.179 17.48 0.000 
2-Way Interactions 3 97.175 97.175 32.392 3.47 0.041 
3-Way Interactions 1 1.870 1.870 1.870 0.20 0.660 
Residual Error 16 149.367 149.367 9.335 
Pure Error 16 149.367 149.367 9.335 


Total 23 737.950 


1216 Part Six Specialized Sindy Designs 


20.2 Analysis of Unreplicated Two-Level Studies 


Example 


In many applications of two-level factorial experiments, particularly when many factors are 
included, only a single replication can be run because of time, budgetary, or other resource 
limitations. As discussed in Chapter 20, no degrees of freedom are available for obtaining an 
estimate of the error variance o* when only one replication is employed. Special procedures 
instead must be used for statistical analysis. 

We shall now describe three approaches for analyzing unreplicated experiments: 


— 


. The pooling of higher-order interactions to obtain an estimate of the variance. 

2. The use of graphical procedures for identifying active effects. 

3. The use of replications at the center point to obtain a pure error estimate of the error 
variance 0°. 


First, however, we shall describe an unreplicated 2* factorial experiment that will be 
used as an illustration. 


The Pecos Foods Corporation initiated an experimental study to characterize the effects 
of process temperature (factor 1 or A), an antimicrobial agent or preservative (factor 2 or 
B), moisture level (factor 3 or C), and acidity (factor 4 or D) on the microbial growth in 
a fruit bar. Microbial growth is measured by counting microbes in a sample of the product 
following three months in storage. The four factors were studied at the following low and 
high levels: 


Factor Low Level High Level 
Process temperature 152 178 
Preservative 0.0 Л 
Moisture ‚65 ‚85 
Acidity 4.8 6.8 


One replication of a 2* factorial experiment was run. The X matrix for the standard 2* 
factorial ANOVA model and the response vector are shown in Table 29.2, in standard 
order. Note that the columns X1, Хэ, Хз, and X4 constitute the design matrix, identifying 
each of the treatments. The response, denoted for simplicity by Y, is the natural logarithm 
of the microbial count. This transformation was chosen partly because the actual counts 
ranged from 87 to 104,4 10—1.e., over several orders of magnitude. In addition, the Box-Cox 
procedure (3.36) supported the use of the logarithmic transformation. 

The regression model version of the four-factor ANOVA model was fitted, using the 
X variables in Table 29.2. The MINITAB regression results for the full ANOVA model 
(р = пт = 16) are presented in Figure 29.2. Because there are no degrees of freedom 
available for error, no estimate of the error variance and no г statistics and P-values for 
the estimated regression coefficients are shown. Note that three estimated factor effect 
coefficients, by = —1.25, Ьу = 1.40, and bə, = —1.40 are substantially larger in absolute 
value than the next largest coefficient, Руза = —.24. Consequently, the preservative and 
moisture factors (2 and 3) may be active. We shall now consider the use of pooling, Pareto 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1217 
TABLE 29.2 Y Vector and X Matrix—Pecos Foods Corporation Example. 
| Y Xo Х X2.Xs Xa Xi Хаз Xu Xz Хм Xa Xiz Xiz4 X13q X34 Хла2з4 
1 555 1 —1-1-1-1 1 1 1 1 1 1-1 A -1 -1 1 
2 447 1 1-1-1-1-1-1-1 1 1 17 T FT 1 -4 A 
З; 519 1 -1 1-11-10 1 4 4A t 1L 1 -1 1 -1 
4. 532 1 1-1 -1T-1 1-1 +1 -1 sl 1-1 -1 1 1 1 
5. 1054 1 -1 -1 1-1 1-1 1-1 t-i r -1 1 1 -1 
6 11.56 1 1-1 1-1 1 .1 1 1 1-1-1 1 -T 1 14 
7, 508 1 -1 1 1-1-1-1 1 1-i-i-i 1 1-1 1 
8 545 1 1 1 1-1 1 1-1 1-1-1 ^41 -1 A -1 -—1 
‚9. 5121 —1-1-1 1 1 1-1 1.-1-1-1 1911-1 
10 563 1 1-1-1 1-1-1 1 1-1-1 1 -1 -1 1 1 
И 618 1 31 1-1 1-1 1d -1-1 i-i 1 -1 1 -1 17 
42 524 1 1 1-1 1/ 1-1 1-1 31 -1 -1 П И 
13 1073 1 -1 -1 1 1 1-1-1-1-1 1 1 1 -1 -1 1 
14 1033 1 1-1 1 1-1 Tt 1-1 -1 1 1 A 1 -1 -1 
15 653 1-1 1 1 1-—1-1-1 14 1 YA A -A 1 -A 
|» 76 493 1 1 1 1 1 1 1T 1 1d 1 1 1 1 1 1 i 
FIGURE 29.2 The regression equation is 
MINITAB Inmicrob = 6.74 — 0.124 x1 — 1.25 x2 + 1.40 x3 + 0.0956 x4 — 0.131 x12 
Regresslon + 0.0481 x13 — 0.179 x14 — 1.40 x23 + 0.134 x24 — 0.109 x34 
— 0.101 x123 — 0.201 x124 — 0.244 x134 + 0.112 x234 + 0.132 x1234 
Results for Full 
ANOVA Predictor Coef Stdev tratio p 
Model—Pecos Constant 6.74062 0.00000 * * 
Foods х1 —0.124375 0.000000 * * 
Согрогайоп х2 —1.25063 0.00000 * * 
Example. x3 1.40313 0.00000 E * 
х4 0.0956249 0.0000000 * i: 
x12 —0.130625 0.000000 * * 
х13 0.0481250 0.0000000 * * 
х14 —0.179375 0.000000 ы; * 
x23 —1.39562 0.00000 е * 
х24 0.134375 0.000000 * * 
х34 —0.109375 0.000000 * * 
x123 —0.100625 0.000000 * * 
х124 —0.200625 0.000000 * * 
х134 —0.244375 0.000000 * f 
x234 0.111875 0.000000 * 
х1234 0.131875 0.000000 * * 
s=* 
Analysis of Variance 
SOURCE DF $$ MS p 
Regression 15 91.62849 6.10857 d * 
Error 0 * * 
Total 15 91.62849 


1218 Part Six Specialized Study Designs 


plots, dot plots, and normal probability plots in an effort to identify more definitively the 
set of active effects. 


Pooling of Interactions 


Example 


A common approach to analyzing unreplicated experiments is to assume that some higher- 
order interactions are inactive. The extra sums of squares corresponding to these Interaction 
terms are then used to form an estimate of the error variance o?. For example, in a 24 
factorial experiment, it may be reasonable to assume that all three-factor and four-factor 
interactions are small or negligible in relation to main effects and two-factor interactions. 
This implies that Воз = Віза = Bisa = asa = Pizza = 0. By dropping the corresponding 
terms from the model, five degrees of freedom will be available for an estimate of o?. For 
balanced two-level experiments, it can be shown that the extra sum of squares for X, is: 


SSR(X,) = nrb; (29.14) 


Since for balanced two-level factorial studies, the columns of the X matrix are orthogonal, 
any extra sum of squares does not depend on the order of the variables and the extra sums 
of squares are additive. Hence, the pooled estimator of o? is as follows: 


5 b; for pooled estimated coefficients 


MSE = пт (29.15) 


Number of pooled coefficients 
Inferences can then be made in customary fashion. 
In the Pecos Foods Corporation example. it was decided that all three-factor and four-factor 


interactions are unimportant. Using (29.15) and the results in Figure 29.2, an estimate of 
the error variance based on five degrees of freedom is: 


MSE = n+ Ding + Ding + Disa + Daas + bhaa 
| 5 
E ta 2 _ 5 : : 
Е | 101)? + (=.201)? + ( a + (112)? + CI2?] _ ug 


MINITAB regression results for the model based on main effects and two-factor inter- 
actions are presented in Figure 29.3. Notice that MSE = .448, as just calculated. Residual 
analysis (not shown) did not reveal any violations in assumptions. 

The P-values in Figure 29.3 indicate that the main effects for preservative and moisture 
(factors 2 and 3) and the preservative-moisture interaction effect are statistically significant; 
each of the associated P-values is .001 or less. The active factors in the Pecos Foods Cor- 
poration example are therefore preservative and moisture. Figure 29.4 presents a MINITAB 
interaction plot of the estimated means Y jx. for the two active factors. We see that increas- 
ing preservative at high levels of moisture decreases microbial growth. At low moisture 
levels, however, preservative has little effect. Correspondingly, at low preservative levels, 
decreasing moisture decreases microbial growth while at high preservative levels, changing 
the moisture level has little effect on microbial growth. 


FIGURE 29.3 
MINITAB 
Regression 
Results for 
ANOVA Model 
without 
Higher-Order 
Interactions— 
Pecos Foods 
Corporation 
Example. 


FIGURE 29.4 
MINITAB 
Interaction 
Plot—Pecos 
Foods 
Corporation 
Example. 


Pareto Plot 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1219 


The regression equation is 
Inmicrob = 6.74 — 0.124 x1 — 1.25 x2 + 1.40 x3 + 0.096 x4 — 0.131 x12 
+ 0.048 x13 — 0.179 x14 — 1.40 x23 + 0.134 x24 — 0.109 x34 


Predictor Coef Stdev tratio р 
Constant 6.7406 0.1673 40.28 0.000 
х1 —0.1244 0.1673 —0.74 0.491 
x2 —1.2506 0.1673 —7.47 0.001 
x3 1.4031 0.1673 8.39 0.000 
x4 0.0956 0.1673 0.57 0.592 
x12 —0.1306 0.1673 —0.78 0.470 
x13 0.0481 0.1673 0.29 0.785 
x14 —0.1794 0.1673 —1.07 0.333 
x23 —1.3956 0.1673 —8.34 0.000 
x24 0.1344 0.1673 0.80 0.458 
x34 —0.1094 0.1673 —0.65 0.542 
s — 0.6693 R-sq — 97.696 R-sq(adj) = 92.796 


Analysis of Variance 


SOURCE DF SS MS F p 
Regression 10 89.3885 8.9388 19.95 0.002 
Error 5 2.2400 0.4480 

Total 15 91.6285 


С 
(Moisture) 

e —1 

= | 


—-1 


LNCOUNT 


(Preservative) 


The Pareto plot is a qualitative tool for visually identifying important effects in unreplicated 
two-level studies. It shows the percentage of the total sum of squares SSTO that is associated 
with each estimated effect in the full factorial model. (Remember that for an unreplicated 
full factorial model, SSTO = SSR.) From (29.14), this percentage 15: 


4_(100) (29.16) 


1220 PartSix Specialized Study Designs 


FIGURE 29.5 
JMP Pareto 
Plot—Pecos 
Foods 
Corporation 
Example. 


Example 


Dot Plot 


Pareto Analysis 


Percent 


TWIN Vo TT OTM 
x x xX X KX KX KX XX х 
* k k ж * xk + + 
—— ON — Со со ON = 
x x xX x x хх x 

+ * + 

сч N — 

х х х 

+ 

pes 

x 


Large percentage contributions correspond to large (absolute) estimated coefficients, and 
therefore to active factor effects. Pareto plots present the percent contributions to SSTO in 
decreasing order, either as a bar plot, a cumulative line plot, or both. 


To calculate the percent contribution to the total sum of squares for each factor effect in the 
Pecos Foods Corporation example, we use (29.16) and the regression results in Figure 29.2 
for the full factorial model. For example, the percent contribution associated with Хз is: 


2 2 
nrbi (100) = 16(1.40) 
SSTO 91.63 
A JMP Pareto plot shown in Figure 29.5 contains both a bar plot and a cumulative line plot. 
Notice that the effects Xo, Хз. and X5 X; account for nearly all of the total variation in the 
data. Thus, the Pareto plot identifies the same factor effects as active as does pooling of 
higher-order interactions. 


(100) = 34.296 


Comments 

1. Other forms of Pareto plots are also used. For example, some statistics packages provide а 
Pareto plot of estimated effects. In these plots, the bars correspond to the absolute magnitudes of the 
estimated effect coefficients. Such plots are sometimes referred to as scree plots. 

2. While Pareto plots are useful for identifying active effects, they can be misused. For example, à 
Pareto plot is sometimes used to identify the smallest effects for pooling to estimate o°. This approach 
often will lead to an estimate of the error variance that is too small, making the Type I error rates for 
tests for active effects larger than desired. 3 a 


Another graphic plot often used in the analysis of unreplicated factorial studies to help 
identify active effects is a simple dot plot of estimated factor effect coefficients. This plot 
will show whether any estimated coefficients are far outlying. We know from (29.9) that 
the variances for all estimated effect coefficients are the same for unreplicated 2° factorial 
studies so that the estimated effect coefficients will follow the same normal distribution if no 


Ехатр!е 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1221 


FIGURE 29.6 Dot Plot of Estimated Factor Effect Coefficients—Pecos Foods Corporation 
Example. 


e? 9 С 
ее e$ e e 
ег 9 
BC B 9e 
pE——ÀdM9——————— ——À 
—2 —1 0 1 2 
Coeff 


effects are present. Inactive factors will tend to be clustered in the middle of the distribution. 
A large departure from the middle of the distribution suggests that the factor may be active. 


A dot plot of the estimated factor effect coefficients for the Pecos Foods Corporation example 
is presented in Figure 29.6, Note that most factor effects fall near zero; these presumably are 
the inactive factor effects. The three outlying coefficients, for factors B and C, correspond 
to the three factor effects identified already by the other techniques as the active effects, 


Normal Probability Plot 


Example 


FIGURE 29.7 
Normal 
Probability 
Plot of 
Estimated 
Effect 
Coefficients— 
Pecos Foods 
Corporation 
Example. 


A normal probability plot of the estimated factor effect coefficients in an unreplicated 2* 
factorial study can be constructed in the same fashion as a normal probability plot of residu- 
als, as described on page 110. This is possible because the estimated factor effect coefficients 
are independent with constant variance o?/nz. Since no estimate of o? is available, we set 
MSE = 1 in (3.6). If no effects are present, all estimated coefficients follow the same normal 
distribution N (0, с2/пт) and should fall along a straight line in the plot. Strong deviations 
from a straight line are indicative of active effects, in which case all estimated coefficients do 
not come from the same normal distribution. Typically, the middle points represent inactive 
effects and fall along a straight line. If they do not, it may be an indication that the error 
terms are not normally distributed. 


Figure 29.7 shows a normal probability plot of the estimated effect coefficients for the 
Pecos Foods Corporation example. A line has been fitted judgmentally to the center points 
that appear to represent inactive effects. Notice that the estimated effect coefficients for the 


1.5 


C o 
U 0.5 
£ Ж 
€ о о 
= о 
Ё 99 
42 __ 
B 0.5 
B 
BC? o 
—1.5 o 
—2 —1 0 1 2 


Expected Value 


1222 Part Six Specialized Siudy Designs 


factor B and factor C main effects and for the BC interaction effect fall away from the li 
fitted to the inactive effects. i 


Comments 

1. When many factor effects are active and only a few are inactive, it may be difficult to fit a lin 
to the few inactive effects at the center. Consequently, a normal probability plot with many ae 
factor effects is often difficult to interpret. Я 

2. Half-normal probability plots. as described in Section 14.8, are often used in place of (full) 
normal probability plots discussed here. One advantage of half-normal probability plots is that iden- 
tification of active effects is sometimes facilitated. This is because the active effects in a half-normal 
plot all fall at the right upper end of the plot whereas in a (full) normal plot active effects may be at 
both ends of the plot. 

3. A normal probability plot containing all factor effects is also appropriate for 2* factorial ex. 
periments with replications provided that there are equal numbers of replications for each treatment. 


Center Point Replications 


When all factors are quantitative, two-level experiments can be augmented by replications 
at the center point. A center point is a new treatment in which each of the factors is set at 
the midpoint of its range. For example, in the Pecos Foods Corporation example, the center 
point treatment levels are: 


Temperature = sae = 165 
Preservative = LL = .05 
Moisture = Sis =.75 
Acidity = квз ев —5.8 


We shall use ту to denote the number of center point replicates. Two important advantages 
stem from the inclusion of two or more such replicates: 


1. A pure error estimate of o? based on по — | degrees of freedom can be obtained, 
avoiding any bias that otherwise might be associated with inferential procedures based 
on the pooling of what appear to be small higher-order effects. 

2. With replications at the center point, it is possible to test whether or not the model is a 
good fit. 


Pure Error Estimate of o°. Let Yo; denote the response associated with the ith replicate 
at the center point, and let Yọ. denote the mean of the ng responses at the center point. A 
pure error estimate of o? is given by the sample variance of the center point replicates: 


ү? 
MSPE = 2-0 = YoY (29.17) 


Ho — | 
Test for Lack of Fit. Once a pure error mean square has been obtained, the test for lack 
of fit in (6.68) proceeds as usual. For a two-level factorial study with no replications that 1$ 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1223 


augmented by no observations at the center point, one degree of freedom is associated with 
SSLF and no — 1 degrees of freedom with SSPE. 

A conclusion of lack of fit indicates that curvature is present in one or more of the factor 
effects, but it is not possible to attribute the curvature effect to aspecific factor without further 
experimentation. Methods for augmenting two-level factorial experiments for assessment 
of curvature effects are discussed in Chapter 30. 


Suppose that four center point replicates had been included in the Pecos Foods Corporation 
study and that these responses are: 
Yo, = 7.23 Yoo = 7.89 Yo3 = 7.80 Yo, = 7.39 


We then find Yo. = 7.578, SSPE = .303, and MSPE = .101. From the regression analysis 
of the augmented data set (output not shown), we find that SSE — 2.544. Hence, using 
(3.24), we obtain: 


кк сс 
Example 
Example — 


SSLF — SSE — SSPE — 2.544 — .303 — 2.241 
Hence, test statistic (6.68b) here is: 


, 2241 , .303 
Е* = у= 22.2 
For a = .05, we require F(.95; 1, 3) = 10.1. Since F* = 222 > 10.1 we conclude Н, 
that curvature is present. The P-value of the test is .018. 

We can obtain some information about the nature of the curvature by comparing the 
average of the responses at the center point, Yo. — 7.578, with the average of the responses 
at the corner points, which is 6.74. Since the mean response is higher at the center point 
than would be expected from a linear interpolation of the corner points, a mound-shaped 
surface may be required to model the response adequately in the interior of the experimental 
region. 


Comment 


When a lack of fit test is conducted after the ANOVA model has been revised by dropping effects 
that appear to be unimportant, a conclusion of lack of fit does not necessarily imply the presence 
of curvature effects. Lack of fit could then also be due, for instance, to the absence of important 
interaction effects. ш 


29.3 ‘Two-Level Fractional Factorial Designs 


Even when each factor is studied at only two levels, the number of treatments grows rapidly 
with the number of factors, as the following table demonstrates: 


Number of Number of 


Factors Treatments 
2 4 
4 16 
6 64 
8 256 


10 1,024 


1224 PartSix Specialized Study Designs 


The use of 1,024 experimental trials for just one replication to study 10 factors у] be 
prohibitive in most instances. In this situation, a subset of all factorial treatments can often 
be used with little loss of information. The use of fractional factorial designs is the subject 
of this section. 

A basic notion that underlies the use of fractional factorial designs is the Sparsity of 
effects principle. This principle states that in most systems, responses are driven largely 
by a limited number of main effects and lower-order interactions, and that higher-order 
interactions usually are relatively unimportant. For example, information concerning three- 
factor and higher-order interactions is often not important compared to main effects and 
two-factor interactions. Under these conditions, a full factorial design can be very wasteful 
when many factors are of interest. For instance, in the analysis of a full six-factor, two-level 
factorial experiment, the degrees of freedom associated with the various factor effects are 
as follows: 


Degrees of 
Model Terms Freedom 
Intercept term 1 
Main effect coefficients 6 
Two-factor interaction coefficients 15 
Three-factor interaction coefficients 20 
Four-factor interaction coefficients 15 
Five-factor interaction coefficients 6 
Six-factor interaction coefficients 1 


Note that 42 degrees of freedom will be devoted to the study of three-factor and higher- 
order interactions. Thus, about 2/3 (42/64) of the degrees of freedom for the study of 
factor effects in this experiment will be used to estimate factor effects that are ordinarily 
of little interest. In contrast, in a fractional factorial design, a subset of the treatments is 
selected in such a way that most of the degrees of freedom for the study of factor effects 
are devoted to main effects and low-order interactions, with only some loss of information 
about higher-order interactions. 


Confounding 


А fractional factorial design achieves the efficiency of providing full information about 
main effects and low-order interactions with fewer experimental trials by confounding 
these effects with unimportant higher-order interactions. To understand the concept of 
confounding, consider again the X matrix of the Pecos Foods Corporation example in 
Table 29.2. A single replication of a 2^ full factorial design was employed here, requiring 
16 experimental trials. Suppose that in advance of the experiment, it had been determined 
that only half of the 16 treatments could be used due to budgetary constraints. Which eight of 
the 16 treatments should be eliminated? Suppose that the experimenter considered dropping 
treatments 2, 3, 6, 7, 10, 11, 14, 15. The X matrix for the remaining eight treatments is 
given in Table 29.3a. 

This choice of treatments to be dropped involves a number of potentially serious prob- 
lems. Notice first that column vectors X, and X, are identical in the eight-run design of 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1225 


TABLE 29.3 X Matrices for Two Half-Fraction Designs of the 2% Full Factorial Design in Table 29.2—Pecos 
Foods Corporation Example. 


(a) Treatments 2, 3, 6, 7,10, 11, 14, 15 deleted 
x 1 X; Хз X4 


ted 
S 


Хз Xis Xz X24. Хза Хз Xi24. Xia Хз X1 234 


+1 -1 -1 -1 1 1 1 1 1 1. -1 1 1 1 1 
1 1 -1 -1 1 -1 -1 -1 -1 1 -t -1 1 1 1 
—1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 1 -1 
1 1 T -1 1 1 -1 1 -1 -1 1 -1 -1 -1 —1 
-1 -1 -1 1 1 1 -1 1 -1 -1 -1 1 1 1 -1 
1 1 -1 1 4 -1 1 -1 1 -1 -1 1 -1 -1 —1 
-1 -1 1 1 1-1 -1 -1 -1 1 1 1 -1 -1 1 
1 1 T 14 1 1 1 1 1 1 1 1 1 1 1 


Table 29.3a; i.e., X; = X2. Because the columns of this X matrix are linearly dependent, 
the matrix X'X is singular and does not have an inverse. To be able to obtain least squares 
and maximum likelihood estimates, we must remove the redundancy resulting from the 
equality of the X, and X; column vectors. We do this by retaining only one of the two 
column vectors. Suppose that we drop the X, column vector. In our original model, the 
main effects for factors 1 and 2 were represented by: 


В.Х + ВХ (29.18) 
When X; = X2, the model terms become: 
В.Х + В›Х\=(+В)Х\ when Ху = Х (29.18а) 


Thus, with the experimental design in Table 29.3a, we will not be able to estimate the 
factor 1 and factor 2 main effects separately but only their combined main effects. If the 
experimental results indicate that the effect associated with X, is active, we will not know 
whether the result is due to the effect of factor 1, the effect of factor 2, or to a combination 
of the effects of these two factors. Factors 1 and 2 are said to be confounded or aliased in 
this experiment. 


1226 Part Six Specialized Study Designs 


Upon further inspection of Table 29.3a, we find seven more pairs of identical columng 
resulting in the following correspondences among the columns of X: | 


X, = X, X; = Хрз X, = Xin Хз = X3 


29.1 
X4 = Хд Xo34 = Ху Х|2з4 = X34 Xi = X (28.13) 


Consequently, the two effects in each of the following pairs will be confounded with each 
other: 


Bi + В Ёз + Bix Ba + Bins Вз + Ёз 


Bra + Ва Boa + Bi3a Bisse Boa Во + bo (29.20) 


Since fi; is confounded with Во, the overall mean, £1 is sometimes said to be unmeasurable. 

The relations in either (29.19) or (29.20) define the complete confounding scheme for 
this fractional factorial design. We shall generally describe a confounding scheme in the 
form of (29.19) and, for simplicity, shall show the column correspondences by means of the 
subscripts of the column vectors. For our example in Table 29.3a, the confounding scheme 
is represented in this fashion as follows: 


1-2 3 = 123 4 — 124 13 = 23 
14 — 24 234 — 134 1234 — 34 12-0 


The subscript numbers are now shown in italics as a reminder that the equality sign applies 
not to the numbers shown but to the column vectors for which the numbers are the subscripts. 

The proposed eight-treatment design in Table 29.3a 15 clearly undesirable since main 
effects are confounded with each other. Suppose instead that the investigator chose to 
eliminate treatments 2, 3, 5, 8, 9, 12, 14, 15. The resulting X matrix is given in Table 29.3b. 
Notice that the correspondences among the columns of the X matrix now are: 


1 = 234 2 = 134 3 = 124 4 = 123 


(29.21) 
12234 13=24 14=23 0= 1234 


We see that main effects are now confounded only with three-factor interactions and that two- 
factor interactions are confounded with other two-factor interactions, while the four-factor 
interaction is confounded with the overall mean. If three-factor and four-factor interactions 
are negligible, this design could be quite useful. In that case, if B, + £234 were found to be 
statistically significant, we could safely conclude that the observed effect is due to factor 1 
and not to the three-factor interaction among factors 2, 3, and 4. 

A potential drawback of the design in Table 29.3b is that the two-factor interactions 
are confounded with other two-factor interactions. If any effects associated with two-factor 
interactions turn out to be active, additional experimental trials wil be required to separate 
these effects. 

An abbreviated ANOVA table for the fractional factorial design in Table 29.3b showing 
only source of variation and degrees of freedom is given in Table 29.4. Notice that only eight 
factor effect coefficients can be estimated, corresponding to the eight confounded pairs of 
effects. Since no degrees of freedom are available for estimation of o?, the tools described 
in Section 29.2 for the analysis of unreplicated factorial studies need to be employed for 
the analysis of factor effects. 


ТАВГЕ 29.4 
Abbreviated 
ANOVA Table 
for Fractional 
Factorial 
Design їп 
Table 29.3b. 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1227 


Source of Variation df 
Хо = X3234 1 
Xi = X234 1 
Хә = Хуза 1 
Хз = X124 1 
X4 шш Xiz 1 

Xi2 = X34 1 
„Хаз = X24. 1 
X14 = Хз 1 
Error. 0 
Total 8 


Defining Relation 


In our explanation of confounding, we began with a full factorial design, arbitrarily dropped 
some treatments from the experiment, and then examined whether the choice of the dropped 
treatments was a good one by considering the confounding scheme of the resulting fractional 
factorial design. Finding an appropriate fractional factorial design is actually done in reverse 
order by first specifying an acceptable confounding scheme. In order to proceed from this 
specification to find the corresponding fractional factorial design, we need to utilize the 
defining relation of the confounding scheme. 

Consider again the fractional factorial design in Table 29.3b. The defining relation for 
this design is the correspondence in (29.21) involving the Xo column: 


0 = 1234 (29.22) 


Recall that (29.22) is a shorthand stating that the Xo column equals the X1234 column. 
Hence, Х = Xii234 for all column entries. The confounding scheme for the design can 
be determined from this defining relation by multiplying the column on each side of the 
defining relation by successive columns of the X matrix, the multiplication being carried 
out term by term. 

Since all column entries for a two-level factorial design are either 1 or —1, some general 
column multiplication results are useful. 


1. When multiplying the Xo column by the Xo column (the resulting column entries 
being XjoXio), all entries remain 1 since Х;о = 1 and (1)? = 1. We state this in the following 
fashion: 


0x0=0°=0 (29.23) 


2. Multiplying any column X, by Хо (the resulting column entries being X;oX;,) leaves 
the column entries unchanged because X;o = 1: 
0xq-q (29.24) 
3. Multiplying any column by itself (the resulting column entries being X;, X;,) yields 
the Xo column since (1)? = (—1)? = 1: 


qxq-q'-0 (29.25) 


1228 Part Six Specialized Study Designs 


Returning now to the defining relation in (29.22), let us multiply the columns on both 
sides of the defining relation by the X, column. On the left side we obtain by (29.24); 


1х0=1 (29.26) 
and on the right side we find: 
1 x 1234 = 12234 = 0234 = 234 (29.27) 
The result in (29.27) follows because we obtain for each column entry: 
Ха Xia = Xa(XaXiXi3 Xi) = X) Хо ХХ = XnXn Xi 
Combining the results in (29.26) and (29.27), we have found: 
I x0= 1 = 1 x 1234 = 234 (1 = 234) (29.28a) 


Continuing the process of multiplying both sides of (29.22) by successive columns of 
the X matrix we find: 


2х0= 2х 1234 = 12234 = 134 (2 = 134) 

3x0— 3x 1234 = 12374 = 124 (3 = 124) 

4х0= 4х 1234 = 1234 = 123 (4 = 123) 
12 х0 = 12 х 1234 = 122234 — 34 (12 = 34) 
13х0= 13 х 1234 = 122324 = 24 (13 —24) 
14х0= 14 х 1234 = 12234 = 23 (14= 23) 


(29.28) 


We stop at this point because multiplication by succeeding columns will yield по new 
confounding relations. 

Notice that the operations in (29.28a) and (29.28b) have reproduced the complete con- 
founding scheme in (29.21). The relation on which these operations were based, 0 = 1234, 
is called the defining relation. The defining relation is always the one that shows the equality 
with the Xo column. 


Half-Fraction Designs 
Once the desired defining relation (and hence, the confounding scheme) is specified, the 
fractional factorial design corresponding to the desired confounding scheme can be con- 
structed in the following manner: 


Step I. Construct the X matrix for the full factorial design. 
Step 2. Choose those rows (treatments) for which the defining relation holds. 


To illustrate the use of this procedure, consider again the Pecos* Foods Corporation 
example in Table 29.2. The desired defining relation is that in (29.22), namely, 0 = 1234. 
Hence, we need to select those treatments for which X;,234 = Хю. We see from Table 29.2 
that X;,234 = 1 for treatments 1, 4, 6, 7, 10, 11, 13, 16. This is the design in Table 29.3b. 
It is called а 2^^! fractional factorial design. As noted before, the 4 in the exponent refers 
to the number of factors. The 1 indicates the level of fractionation; here the full factorial 
design was fractionated one time. In general, we shall refer to a 2*^ fractional factorial 
design, where k denotes the number of factors and f the fraction. 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1229 


An equally useful half-fraction design can be constructed from the eight treatments that 
were omitted (2, 3, 5, 8, 9, 12, 14, 15). Notice from Table 29.2 that Xi9 = —Xj1234 for these 
treatments. 'The defining relation for this alternate half-fraction design is therefore: 


0 — —1234 (29.29) 

It is easy to verify that the complete confounding scheme for this design 15: 
1 = —234 2 = —134 3 = —124 4 = —123 
12 = —34 13 = —24 14 = —23 0 = —1234 

We see that confounding scheme (29.30) for the omitted treatments is the same as that 
of the retained treatments in (29.28) except that the sign of the second term has changed. 
Statistically, both of these half-fractions provide similar information, and either one can be 
used. The choice is sometimes based on the investigator's desire to include one or more 
specific treatments in the experiment. For example, when the treatment consisting of all runs 


at the first level (—1) is the control treatment, the investigator who wishes to include this 
treatment would select the half-fraction corresponding to the defining relation 0 = 1234. 


(29.30) 


Comment 

The identification of the treatments to be included in a 2*7 fractional factorial design can be carried 
out without first constructing the X matrix for the full 2* factorial study. The use of design generators 
permits the construction of a 2-7 fractional factorial design by constructing the X matrix for a 
full factorial study in only k — f factors and then augmenting this matrix. Details are provided in 
Reference 29.1. ш 


Quarter-Fraction and Smaller-Fraction Designs 

When the number of factors is large, the number of treatments in even a one-half fraction 
design may still be prohibitively large. In such cases, smaller fractions may be obtained 
by continuing the process of fractionation. For example, in the Pecos Foods Corporation 
example, a single replication of a full factorial study involves 2* — 16 experimental trials 
and a half-fraction design involves 2*~' = 8 trials. A single replication of a quarter-fraction 
design willinvolve only 2^? = 4 trials and an eighth-fraction design will consist of 2-3 = 2 
trials. We shall now describe the construction and analysis of 2^—/ fractional factorial 
designs. The number of treatments in such a design is 2—7. 

We shall illustrate how to obtain the confounding scheme for a quarter-fraction design by 
returning to the Pecos Foods Corporation example in Table 29.3b, where the half-fraction 
design is based on the defining relation 0 = 1234. Let us fractionate this design in half 
by using the defining relation 0 = 12. From Table 29.3b, we see that X; = Xin = 1 
for treatments 1, 4, 13, 16. The X matrix for this new quarter-fraction design is given 
in Table 29.5. Notice that the confounding of effects has become quite severe. From an 
inspection of the columns of the X matrix, we find that the complete confounding scheme is: 


0 = 1234 = 12 — 34 (defining relation) 
1= 234 = 2 = 134 
3 = 124 = 123 = 4 

13 = 24 = 23 = ]4 


1230 Part Six Specialized Study Designs 


TABLE 29.5  Quarter-Fraction Design of 2^ Full Factorial Design in Table 29.2, Based on Defining Relation: 
0 = 1234 = 12 = 34—Pecos Foods Corporation Example. 


Treatment Xo 


Example 


1 


1 
1 
1 


Xi X2 Хз Ха Хә Хз Ха Хз Xa Хза Xiz Хоа Xia Xza Xia; 


=f -1 -1 -1 1 1 1 1 1 1 -1 -1 -1 A1 1 
1 1 -1 -1 1 -1 -1 -1 -1 1 -1 -1 1 1 1 
-1 -1 1 1 1 -1i -1 -1 -1 1 1 1 -1 -1 1 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 


Since main effects are confounded with each other (1 with 2, and 3 with 4) this design is 
clearly undesirable. 

As in the case of a half-fraction design, the confounding scheme for a quarter-fraction 
design can be determined directly without constructing the X matrix. We begin with the 
half-fraction defining relation: 


0 — 1234 (29.312) 
We then augment this with the defining relation for the second fractionation: 
0 = 1234 = 12 (29.31b) 


Finally, we need to add a term to recognize that the X34 column is also equal to the X23, 
and Xj columns: 


0 = 1234 = 12 = 34 (29.31c) 


34 is called the generalized interaction. Yt can be automatically identified by multiplying 
the two interaction columns X 234 and X in the augmented defining relation in (29.31b): 


1234 x 12 = 122234 = 34 


In general, for a 2^ fractional factorial design, there are 2/ terms in the defining relation. 
These consist of: 


1. The constant term, 0. 

2. The f interaction terms used to define the f successive fractionations. 

3. The2/ — f — 1 generalized interactions, constructed from the cross products involving 
pairs, triples, and so on, of the f interaction terms used to define the f successive 
fractionations. Since there are 2/ terms in the defining relation for a 2^7 fractional 
factorial design, we see that each factor effect is confounded with 2/ — 1 other factor 
effects. s 


Once the defining relation has been obtained for a 2^—/ design, the complete confounding 
scheme can be found by multiplying all terms in the defining relation successively by the 
main effect and interaction columns in the X matrix. 


A two-level, five-factor experiment is to be fractionated, first on the basis of the relation: 


0 — 124 (29.322) 


Resolution 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1231 


and a second time using: 
0 = —135 (29.32b) 
We shall now determine the complete confounding scheme for the experiment. Combining 
(29.322) and (29.32b), we obtain: 
0 — 124 = —135 
The generalized interaction is therefore: 
124 x —135 = —122345 = —2345 
The defining relation consequently is: 
0 = 124 = —135 = —2345 (29.33) 
The complete confounding scheme is determined by multiplying the terms in (29.33) 
successively by each of the 2? — 1 — 31 main effect and interaction columns. For example: 
1х0=1х 124 = 1] x —135 = 1 x —2345 о 1=24=—35 = —12345 
In summary we find (omitting any redundant entries): 
0 = 124 = —135 = —2345 
1 = 24 = —35 = —12345 
2 = 14 = —1235 = —345 
3 = 1234 = —15 = —245 
4 = 12 = —1345 = —235 
5 = 1245 = —13 = —234 
23 = 134 = —125 = —45 
34 = 123 = —145 = —25 


The eight treatments to be included in this fractional factorial design are those for which 
Xiz = 1, Xis = —1, and Xj2345 = —1 simultaneously. 


The resolution of a two-level fractional factorial design, denoted by R, is the number of 
factors involved in the lowest-order effect in the defining relation, excluding the constant 
term (0). This is a critical characteristic of a design because it indicates the severity of the 
confounding scheme. For instance, recall that the defining relation of the 2^^! fractional 
factorial design of Table 29.3b is: 


0 — 1234 


The resolution of this half-fraction design is R — 4 because there are four factors involved 
in the term 7234. The resolution R = 4 tells us that the most severe cases of confounding 
will involve: 


1. A main effect and a three-factor interaction (e.g., / — 234) 
2. A two-factor interaction and another two-factor interaction (e.g., 12 = 34) 


1232 Part Six Specialized Study Designs 


Roman numerals are commonly used to denote the resolution to avoid confusion with the 
number of factors. We characterize the design in Table 29.3b as a 2t! fractional factorial 
design to indicate that it has resolution R — IV. 

In general, the higher is the resolution of a design, the less severe is the degree of 
confounding. The resolution should never be less than III. In a resolution II design, at 
least one pair of main effects will be confounded together. For example, consider the gra 
quarter-fraction design with defining relation: 


0 = 123 = 45 = 12345 (29.34) 


Since the lowest-order effect in this defining relation is 45, the design has resolution II. 
Here the factor 4 main effect is confounded with the factor 5 main effect (4 — 5), which is 
clearly most undesirable. Fractional factorial designs of resolution III, IV, and V are most 
commonly used. The relationship between resolution and degree of confounding for these 
three classes of designs can be summarized as follows: 


Design 
Resolution Worst-Case Degree of Confounding 


Ш Some main effects аге confounded with two-factor interactions. 

IV Some main effects are confounded with three-factor interactions. 
Some two-factor interactions are confounded with other two-factor 
interactions. 

V Some main effects are confounded with four-factor interactions. 
Some two-factor interactions are confounded with three-factor 
interactions. 


Projection Property. A useful property of fractional factorial designs is that any design of 
resolution R contains complete factorial designs in any subset of R — 1 factors. For example, 
consider the resolution IV half-fraction design in Table 29.3b. Note that if we were to drop 
the fourth factor, for instance, a full factorial eight-run design would result for the first three 
factors. This has important design implications. Suppose that an experimenter expects that 
at most three of the five factors in a study will turn out to be active. By choosing a fractional 
design with resolution IV, the experimenter will be assured that once the inactive factors are 
identified and dropped from the analysis, the experimental design for the remaining active 
factors will be a full factorial design with no confounding. 


Selecting a Fraction of Highest Resolution . 


Clearly, it is desirable that a defining relation be chosen so that the resolution of the design 
is as large as possible. For half-fraction designs, this is easy: equate the highest-order 
interaction column with the Xo column. For example, to provide the maximum resolution 
(V) in a five-factor study, set the defining relation as follows: 


0 — 12345 


In general, the resulting resolution is equal to the number of factors in the study. 


Ехатр!е 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1233 


For quarter replicates, eighth replicates, and so on, identifying the defining relation that 
yields the maximum resolution is not so simple. For example, consider the choice of a 
defiriing relation for a 2°-? design. If we fractionate first on the basis of: 


0 = 123456 
the highest resolution possible will be III: 
0 = 123456 = 123 = 456 
However, an alternative defining relation leads to a resolution IV design: 
0 = 1235 = 2346 = 1456 


This is, in fact, the highest possible resolution in a 25? fractional factorial design. 

The 2-7 fractional factorial designs that have highest resolution have been identified 
and catalogued for choices of k and f that are of general interest (Ref. 29.1). Table 29.6 
lists the defining relations for these designs for 3 < k < 9; the generalized interactions 
have been omitted in this listing for the sake of brevity. A number of software packages 
also will construct fractional factorial designs with highest possible resolution for specified 
numbers of factors and experimental trials. Most of these packages construct fractional 
factorial designs employing the defining relations in Table 29.6. 


The Iowa Aluminum Corporation manufactures sheet aluminum from recycled aluminum 
beverage containers. The manufacturing process first casts molten aluminum onto a con- 
veyor belt in a continuous strip. The strip is then sprayed with a coolant comprised of a 
mixture of water and oil as it enters each of three mills. After the processing in the third mill, 
the strip is automatically coiled and packaged for shipping. The surface of the aluminum 
sheets must be sufficiently clean and free of defects or the product will not be shipped. 
Historically, the rejection rate was about 25 percent. 

In an effort to reduce the percentage of rejected coils, an experimental study was under- 
taken. Management committed two days of production to the experiment, which permitted 
about 20 experimental trials. Six factors that might affect the quality of the aluminum were 
identified: (1) temperature of the coolant; (2) percentage of oil in coolant; (3, 4, 5) volume 
of coolant applied to the strip at each of the three mills (as a percentage of full volume); and 
(6) strip speed. Low and high limits for each of the factors were identified for the two-level 
six-factor experiment. Since a 2° experiment involves 64 factor level combinations or treat- 
ments and since only about 20 experimental trials were feasible, a one-quarter fractional 
factorial design was needed. 

Figure 29.8 contains a summary of the quarter-fraction design for the two-level six- 
factor experiment provided by the MINITAB Fractional Factorial procedure. We see that a 
resolution IV design is the highest-resolution design that can be attained in a 16-run, six- 
factor fractional factorial study. For this resolution, we know that all main effects are clear of 
other main effects and two-factor interactions and that some main effects will be confounded 
with three-factor interactions. Also, some two-factor interactions will be confounded with 
other two-factor interactions. The complete confounding scheme is shown in Figure 29.8, 
where the factors are denoted A through F (instead of / through б) and the symbol I 
is used (instead of 0) to denote the constant term. Also, MINITAB uses the format in 


1234 Part Six Specialized Study Desigus 


TABLE 29.6 


Two-Level 
Fractional 
Factorial 
Designs with 
Maximum 
Resolution 
for Three to 
Nine Factors. 


Number of Number of Defining Relation 
Factors Fraction Runs (omitting generalized interactions) 
3 2ўг\ 4 0= 123 
4 25 8 0= 1234 
5 25! 16 0 — 12345 
2 8 0 = 124 = 135 
6 25 32 0 = 123456 
2572 16 0 = 1235 = 2346 
A 8 0= 124 = 135 = 236 
7 25 64 0 — 1234567 
2? 32 0 = 12346 = 12457 
257 16 0 = 1235 = 2346 = 1347 
m^ 8 0= 124 = 135 = 236 = 1237 
8 2 64 0 = 12347 = 12568 
2873 32 0 = 1236 = 1247 = 23458 
2 16 0 = 2345 = 1346 = 1237 = 1248 
9 255 128 0 — 134678 — 235679 
267 64 0 — 12347 = 13568 = 34569 
Pn 32 0 — 23456 = 13457 = 12458 = 12359 
on 16 0 = 1235 = 2346 = 1347 = 1248 = 12349 


(29.20) to represent the confounding scheme. For example, the defining relation is listed by 
MINITAB as: 


у 1+ ABCE + ADEF + BCDF 
In our representation, the defining relation is expressed as follows: 
0 — 1235 — 1456 — 2346 


Management was willing to assume that all three-factor interactions would be quite small 
in relation to main effects and two-factor interactions. It also recognized that if important 
two-factor interactions are found to be present, it may be necessary to conduct additional 
experimental trials to separate confounded interaction effects. Management therefore de- 
cided to use the fractional factorial design in Figure 29.8, with four replications added at 
the center point to provide a rough estimate of the error variance and a test of the fit of the 
model. 


FIGURE 29.8 
MINITAB 
Fractional 
Factorial 
Design 
Summary— 
Jowa 
Aluminum 
Corporation 
Example. 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1235 


Fractional Factorial Design 


Factors: 6 Design: 6, 16 Resolution: IV 
Runs: 16 Replicates: 1 Fraction: 1/4 
Blocks: none Center points: 0 


Design Generators: E = ABC Е = BCD 
Alias Structure 


| + ABCE + ADEF + BCDF 


A + BCE + DEF + ABCDF 
B + ACE + CDF + ABDEF 
C + ABE + BDF + ACDEF 
D + AEF + BCF + ABCDE 
Е + ABC + ADF + BCDEF 
F + ADE + BCD + ABCEF 
AB + CE + ACDF + BDEF 
AC + BE + ABDF + CDEF 
AD + EF + ABCF + BCDE 
AE + BC + DF + ABCDEF 
AF + DE + ABCD + BCEF 
BD + CF + ABEF + ACDE 
BF + CD + ABDE + ACEF 


ABD + ACF + BEF + CDE 
ABF + ACD + BDE + CEF 


Table 29.7 contains the design matrix listed in standard order for the MINITAB fractional 
factorial design augmented by four replications at the center point. In the right column are 
shown the results of the experiment. The response of interest is the surface impurity score, 
where surface impurities are rated on a 0—10 scale (0 = no impurity, 10 = high impurity). 
The MINITAB output for an initial factorial ANOVA fit is shown in Figure 29.9. Because 
four replications at the center point were made, an estimate of o? is available. From an initial 
inspection of the absolute size of the factor effect coefficients and their associated P-values, 
it appears that the active effects are main effects for oil percentage, coolant volume 3, 
and strip speed, and the two-factor interaction between coolant temperature and coolant 
volume 1 (which is confounded with the two-factor interaction between oil percentage and 
coolant volume 3). 

Since this study was exploratory in nature, a new model was developed in which only the 
factor effects identified as active (X2, X5, X6, X13 = X55) are retained. An ANOVA model 
containing the three main effects and one interaction effect was fitted. Residual analysis 
(not shown) did not reveal any serious departures from the model assumptions. Figure 29.10 
contains the MINITAB output for a regression fit of the revised ANOVA model. Note that 
the lack of fit statistic is shown, F* = MSLF/MSPE = .04, for which the P-value is .9958. 
Hence, the fit of the revised model appears to be good. We see from the ANOVA output that 
the statistical significance of the estimated factor effect coefficients bz, bs, be, and руз + bas 
is confirmed. 

We turn now to the interpretation of the experimental results. Because the Вз and Bos 
interaction terms are confounded, the source of this effect cannot be determined on the basis 
of the experimental results. Notice, however, that both the factor 2 and factor 5 main effects 


1236 PartSix Specialized Study Designs 


TABLE 29.7 Experimental Design Matrix and Y Observations—Iowa Aluminum Corporation Example, 


(0| 40 ә” Бум ә 


Design Matrix 
Coolant Cil Coolant Coolant Coolant Strip Impurity 
Treatment Temperature Percentage Volume1 Volume2 Volume3 Speed Score 

X X2 Хз X4 Xs Xe Y 
—1 —1 —1 —1 -1 —1 4 

1 —1 —1 —1 1 -1 6 

—1 1 —1 —1 1 1 7 

1 1 —1 —1 —1 1 2 
som —1 1 —1 1 1 3 
1 -~ —1 1 —1 —1 1 1 

—1 MO 1 —1 -1 -1 5 

1 1 1 —1 1 —1 9 

-1 E -1 1 -1 1 3 

1 -1 —1 1 1 1 2 

—1 1 —1 1 1 -1 8 

1 1 —1 1 -1 -1 5 

—1 -1 1 1 1 -1 4 

1 —1 1 71 —1 -1 4 

—1 1 1 1 —1 1 4 

1 1 1 1 1 6 

0 0 0 0 0 0 3 

0 0 0 0 0 0 5 

0 0 0 0 0 0 4 

0 0 0 0 0 0 6 


were identified as active and that neither the factor 1 nor the factor 3 main effects were 
statistically significant. These results suggest (but do not prove) that the observed effect is 
likely due to the £55 interaction. To investigate this further, a small follow-up 2 х 2 factorial 
experiment was run involving only factors 1 and 3. No f interaction effect was found, and 
it was therefore concluded that the Вуз + 825 confounded interaction effect in the original 
experiment is due to the £55 interaction effect. 

The results of the experiment are summarized in Figure 29.11 by a main effects plot for 
factor 6 (strip speed) and an interactions plot for factors 2 (oil percentage) and 5 (coolant 
volume 3). The results can be qualitatively summarized as follows: 


1. Figure 29.11a shows that increasing strip speed decreases observed surface impurities. 
Strip speed should therefore be set at its high level (X = 1). 

2. Figure 29.11b shows that when oil percentage is at its high level, increasing coolant 
volume 3 increases surface impurities. When oil percentage is at its low level, increasing 
coolant volume 3 has relatively little effect on surface impurities. We also see that increasing 
the oil percentage increases surface impurities; the effect is particularly strong when coolant 
volume 3 is at its high level. Thus, both oil percentage and coolant volume 3 should be set 
at their low levels (X; = —1 and X5 = —1). 


FIGURE 29.9 
MINITAB 
Fractional 
Factorial 
Output for 
Initial 
Model—lowa 
Aluminum 
Corporation 
Example. 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1237 


Estimated Effects and Coefficients for Defects 


Term Effect Coef StdCoef  t-value P 
Constant 4.550 0.2503 18.18 0.000 
Cooltemp -0.375 | -0.187 0.2799 —0.67 0.540 
Oilpct 2.375 1.187 0.2799 4.24 0.013 
Coolvoli —0.125 —0.062 0.2799 —0.22 0.834 
Coolvol2 —0.125 —0.062 0.2799 —0.22 0.834 
Coolvol3 2.125 1.062 0.2799 3.80 0.019 
Stripspd -2.125 —1.062 0.2799 —3.80 0.019 
Cooltemp *Oilpct -0.125 0.062 0.2799 —0.22 0.834 
Cooltemp*Coolvol1 1.375 0.687 0.2799 2.46 0.070 
Cooltemp*Coolvol2 —0.125 —0.062 0.2799 —0.22 0.834 
Cooltemp*Coolvol3 0.625 0.312 0.2799 1.12 0.327 
Cooltemp* Stripspd —1.125 —0.563 0.2799 —2.01 0.115 
Oilpct*Coolvol2 0.125 0.062 0.2799 0.22 0.834 
Oilpct*Stripspd 0.125 0.062 0.2799 0.22 0.834 
Cooltemp* Oilpct* Coolvol2 0.125 0.062 0.2799 0.22 0.834 
Cooltemp* Oilpct* Stripspd 0.125 0.062 0.2799 0.22 0.834 


Analysis of Variance for Defects 


Source DF Seq SS Adj SS Adj MS F P 
Main Effects 6 59.3750 59.3750 9.89583 7.90 0.033 
2-Way Interactions 7 14.4375 14.4375 2.06250 1.65 0.330 
3-Way Interactions 2 0.1250 0.1250 0.06250 0.05 0.952 
Residual Error 4 5.0125 5.0125 1.25312 

Curvature 1 0.0125 0.0125 0.01250 0.01 0.936 
Pure Error 3 5.0000 5.0000 1.66667 

Total 19 78.9500 


We can predict the mean impurity level produced by the process at the optimum (coded) 
settings of the control variables: 


Х = Oil percentage = —1 
Х+ = Coolant volume 3 = —1 (29.35) 
X6 = Strip speed = 1 
by using the fitted regression model equivalent to the final ANOVA model in Figure 29.10: 
Y = 4.5500 + 1.1875X; + 1.0625X5 — 1.0625 Х& + .6875X5s (29.36) 
The estimated impurity response for process setting (29.35) is: 
Y, = 4.5500 + 1.1875(—1) + 1.0625(—1) — 1.0625(1) + .6875(—1)(—1) = 1.925 


A confirmation run at the optimum setting can be carried out to assess the validity of the 
estimated regression function. The validity is supported if the new response falls inside the 
1 — o prediction limits (6.63). The 95 percent limits turn out to be (see Figure 29.10): 


—.312 < Үһпе»у < 4.162 


ME | 1238 PartSix Specialized Study Designs 


FIGURE 29.10 The regression equation is 


b. MINITAB Defects = 4.55 + 1.19 Oilpct + 1.06 Coolvol3 — 1.06 Stripspd + 0.687 Tmp*vol 1 
Ў d guum Predictor Coef Stdev t-ratio р 
Ж гои constant 4.5500 0.2058 22.11 0.000 
à Oilpct 1.1875 0.2300 5.16 0.000 
И Output for Coolvol 3 1.0625 0.2300 4.62 0.000 
«d Revised Stripspd —1.0625 0.2300 —4.62 0.000 
уң Model—lowa ^ Tmp*vol 1 0.6875 0.2300 2.99 0.009 
1 Aluminum 
B Corporation s = 0.9201 R-sq = 83.996 R-sq(adj) — 79.696 
| Example. 


Analysis of Variance 


SOURCE DF SS MS F p 

eg Regression 4 66.250 16.562 19.56 0.000 

| Error 15 12.700 0.847 

| Total 19 78.950 

| SOURCE DF SEQ SS 

| Ойрсї 1 22.562 

| Coolvol 3 1 18.062 

| Stripspd 1 18.062 

| Tmp*vol 1 1 7.563 

1 

| Fit — Stdev. Fit 95.096 C. I. 95,096 P. I. 

1.925 0.504 (0.851, 2.999) (—0.312, 4.162) 

H 

Pure error test — F = 0.04 Р = 0.9958 DF (pure error) = 11 


FIGURE 29.11 Main Effect and Interaction Plots—Iowa Aluminum Corporation Example. 


(a) Strip Speed Main (b) Oil Percentage—Coolant 
Effect Plot Volume 3 Interaction Plot 


Oil Percent - High 


Defects 
л 
Defects 
л 


mw ___ 


Oil Percent - Low 


—1.0 —0.5 0.0 0.5 1.0 —1.0 —0.5 0.0 0.5 1.0 
Stripspeed Coolvol3 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1239 
Since the impurity response cannot be negative, the prediction limits should be modified as 
follows: 
0 < Уе») < 4.162 


А new response at the optimum levels less than 4.162 will be consistent with the model's 
prediction. 


29.4 Screening Experiments 


In the early stages of an investigation, it is not uncommon for investigators to identify a 
large number of potential explanatory variables. Unfortunately, the number of model terms 
required for a large number of factors becomes enormous. For example, in a manufacturing 
process optimization study, a brainstorming session involving manufacturing engineers, 
product development scientists, and line operators resulted in the identification of 28 po- 
tentially important factors. In addition to 28 parameters for main effects, there would be 
28(27)/2 = 378 parameters for two-factor interactions, [28(27) 26)]/[2(3)] = 3,276 pa- 
rameters for three-factor interactions, and there would be many additional parameters for 
higher-order interactions. Even an investigation of just the main effects and two-factor inter- 
actions for 28 factors by use of a resolution IV or a resolution V fractional factorial design 
would be impossible here. 

For these circumstances, screening designs are useful. With these designs, the objective 
is simply to identify the set of active factors. No information about interactions or curvature 
is typically obtained. In this section, we shall discuss the use of resolution III fractional 
factorial designs and Plackett-Burman designs for the purpose of screening large numbers 
of factors. 


2! Fractional Factorial Designs 
Recall that in a resolution III fractional factorial design, main effects are confounded with 
two-factor interactions. If it can be assumed that first-order interactions are small relative 
to the main effects, then a resolution III design can be used to identify the active factors. 
As a simple example, consider a study of three factors, each at two levels, to be con- 
ducted with four experimental trials. A half-fraction of highest resolution is obtained by 
fractionating the 2? factorial on the basis of the defining relation: 


0 — 123 
The confounding scheme is therefore: 
= 23 
2= 13 
3 = 12 


If it can be safely assumed that the two-factor interactions 82, Вз, and £53 are small in 
relation to the main effects В, £5, and fs, then this half-fraction design can be used for 
identifying the set of active factors. 

The use of resolution III designs for initial screening is typically followed by one or 
more experiments involving those factors that are identified as important. For example, a 


1240 PartSix Specialized Study Designs 
10-factor, resolution III experiment (210-6) involving 16 experimental trials was used to 
study the effects of six process variables and four ingredient variables on the extent of 
crystallization in ice cream. Three factors were identified as important. The interactiong 
among these three factors were then studied in a follow-up 2? factorial experiment. 


Comment 

Any resolution III fractional factorial design can be augmented by a second fraction of the same size 
to yield a new design of resolution IV or higher. The design matrix for the second fraction is obtained 
from that for the first fraction by simply reversing all signs. This process is sometimes called folding 
over the first fraction, and the resulting, combined design is sometimes referred to as a foldover 
design. L| 


Plackett-Burman Designs 

One limitation of resolution Ш fractional factorial designs is the requirement that the number 
of treatment combinations be a power of 2. The total experimental trials must therefore be 
4, 8, 16, 32, 64, and so on. Plackett-Burman designs are two-level, resolution III designs 
that can be used for studying up to ит — 1 factors in пт experimental trials, where пт is 
a multiple of 4. Valid run sizes for Plackett-Burman designs are therefore 4, 8, 12, 16, 20, 
and so on. Plackett-Burman designs for ит < 100 are given (with the exception nz = 92) 
in Reference 29.2. When пт is a power of 2, the Plackett-Burman designs correspond to 
the resolution Ш fractional factorial designs already discussed. When ит is not a power of 
2, the confounding structure of the Plackett-Burman designs is very complicated. Plackett- 
Burman designs are available in many statistical software packages that provide capabilities 
for the design of experiments. 

The analysis of Plackett-Burman designs is carried out in the same manner as for frac- 
tional factorial designs. Since these designs are usually run in a single replication, the various 
graphical procedures discussed in Section 29.2 can be used to identify active effects. Center 
point replications can also be added to provide an estimate of the error variance o? anda 
test for lack of fit. 


20.5 Incomplete Block Designs for Two-Level 


Factorial Experiments 


When we considered randomized complete block designs in Chapter 21 and incomplete 
block designs in Chapter 28, we noted that blocks are chosen so that the experimental units 
within a block are homogeneous while they differ from block to block. When the number 
of treatments is large, it may be difficult to find blocks of sufficient size to permit the use 
of a complete block design. For example, if a block is a mold of four plastic parts, an 
experiment with eight treatments cannot be run using a mold as a complete block. However, 
an incomplete block design can be used here, with one-half of the treatments placed in 
one mold and the other four treatments in a second mold. Incomplete block designs are 
frequently required in factorial studies with a large number of factors. In this section, we 
discuss the use of incomplete block designs in two-level factorial experiments. The only 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1241 


restriction is that the incomplete block size must be a power of 2. We shall start with an 
example for purposes of illustration. 


Steichen Bakeries was developing a partially baked French bread for national distribution. 
A study was undertaken to investigate the effects of proofing time, proofing temperature, 
baking time, and baking temperature on the volume and texture of the final product. A 
two-level, four-factor experiment was under consideration, involving 16 treatments. The 
production facility could produce from 8 to 10 batches of bread in a given day. Since 
ambient temperature and humidity in the plant can change significantly from day to day, 
blocking by day was considered to be important. Hence, an incomplete block design was 
required such that the 16 treatments are placed into two blocks of size eight. We will now 
consider how to place the 16 treatments into two blocks. 


Example 


Assignment of Treatments to Blocks 


The design matrix for the 2^ full factorial study in the Steichen Bakeries example is shown in 
Table 29.8a. Suppose that the treatments are allocated to blocks in accordance with the level 
of the 1234 interaction column (X234). That is, all treatments for which Xi234 = —1 are 
allocated to block 1 (day 1), and all treatments for which X1234 = 1 are assigned to block 2 
(day 2). With this arrangement, it can be seen that the block effect (i.e., the day effect) will 
be completely confounded with the four-factor interaction effect. We thus forfeit the ability 
to obtain an estimate of the four-factor interaction effect 61234 that is free of block (day) 
effects. However, estimates of all main effects, two-factor interactions, and three-factor 
interactions will be independent of the block effect. 

The blocking arrangement chosen by confounding the block effect with the 1234 inter- 
action effect is displayed in Table 29.8a. Notice that each of the four factors appears four 
times at its low level and four times at its high level within each block. Thus, if ambient 
temperature is exceptionally high on day 1, causing the loaves of bread baked on that day 
to have volumes that are larger than usual, this effect will not bias the estimates of any 
of the main effects. It can be verified that the same balance of high and low levels (1s 
and — 1s) within each block is also present for all interaction columns except for the X234 
column. 

The analysis of the experiment is identical to that of a full 2^ factorial study. The only 
difference concerns the interpretation of results, where it must be remembered that the 
four-factor interaction effect is confounded with the block (day) effect. 

In general, blocking of factorial and fractional factorial designs is accomplished by 
confounding block effects with carefully chosen, high-order interaction effects. The division 
of treatments into blocks is performed in three steps: 


1. Identify the high-order interaction effects to be confounded with the block effects. If the 
number of desired blocks is b — 2", v interaction effects need to be identified. 

2. Constructthe v columns of the X matrix that correspond to the interaction effects chosen. 
The patterns of 1s and —1s in these columns are used to identify the blocks. 

3. The v interaction effects chosen, along with their generalized interactions, are con- 
founded with the block effects. In all, b — 1 effects are so confounded. 


1242 PartSix Specialized Study Designs 


TABLE 29.8 Blocking Arrangements—Steichen Bakeries Example. 


(a) 2* Experiment in Two Blocks 


Proofing ,Proofing Baking Baking 

Block Treatment Time Temperature Time Temperature 
(Day) Xx X2 > Xs X4 

1 1 1 -1 -1 -1 

1 2 —1 1 -1 —1 

1 3 —1 -1 1 =] 

1 4 1 1 1 —1 

1 5 = —1 -1 1 

1 6 1 1 -1 1 

1 7 1 —1 1 1 

1 8 —1 1 1 1 

2 9 _1 -1 -1 =1 

2 10 1 1 =1 =1 

2 11 1 —1 1 =l 

2 12 =l 1 1 -1 

2 13 1 —1 =] 1 

2 14 —1 1 —1 А 

2 15 =] —1 1 1 

2 16 1 1 1 1 

(b) 2* Experlment in Four Blocks 

з Proofing Proofing Baking Baking 
Block Treatment Time Temperature Time Temperature 
(Day) Xi X2 Xa X4 

1 1 1 1 -1 -1 

1 2 -1 —1 1 -1 

1 3 ~1 1 -1 1 

1 4 1 —1 1 1 

2 5 -1 —1 -1 —1 

2 6 so 1 1 1 -1 

2 7 : —1 -1 1 

2 8 = 1 1 1 

3 9 -1 1 -1 -1 

3 10 1 =] 1. -1 

3 11 1 1 —1 1 

3 12 —1 -1 1 1 

4 13 1 -1 ==] -1 

4 14 -1 1 - 1 -1 

4 15 —1 -1 —1 1 

4 16 1 1 1 1 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1243 


In effect, this procedure fractionates the chosen design v times, and the 2” resulting 
fractions define the divisions of treatments into blocks. This will result in 2” blocks of size 
2/—" in the case of a full factorial study, or 2" blocks of size 2^ ^" in the case of a 2—7 
fractional factorial study. 

As when constructing fractional factorial designs, the v interactions selected to define the 
blocks must be carefully chosen so that, to the greatest extent possible, low-order effects re- 
main clear of block effects. Useful blocking arrangements have been catalogued (Ref. 29.1). 
They are also usually provided by statistical software packages that have capabilities for 
the design of experiments. 


Example _ In the Steichen Bakeries example, the investigator wished to run the 2^ factorial study in 

———— — ——- {ош blocks. Here, the number of blocks is b = 4 = 2", so that v = 2. Thus, two higher- 
order interaction effects that are to be confounded with block effects need to be chosen for 
identifying the treatments assigned to the blocks. The investigator chose interactions 23 and 
124. The treatments were then assigned to blocks in the following fashion: 


Value Value Treatment 
of X23 of X124 Assigned to 
—1 —1 Block 1 
1 —1 Block 2 
—1 1 Block 3 
1 1 Block 4 


Since there are b = 4 blocks, b — 1 = 3 factor effects are confounded with block effects. 
These are the 23 interaction, the 124 interaction, and their generalized interaction: 


23 x 124 = 12°34 = 134 


The resulting design is shown in Table 29.8b. Notice again the balance of levels within 
each block: each factor appears twice at its high level and twice at its low level. This will 
also be true for all interaction columns except X23, X124, and X34; these columns will be 
constant within each block. An abbreviated ANOVA table is shown in Table 29.9. Note 
that this table shows the confounding of the three interaction effects, 23, 124, and 134, 
with blocks. 


Use of Center Point Replications 


We noted earlier that two or more replications are often added at the center point when the 
factors are quantitative to provide an estimate of the error variance o? and a test for lack of 
fit. When blocking is used, center point replications must be placed within the same block 
to obtain a valid measure of pure error. Otherwise, differences in responses will be due to 
both experimental error and block-to-block differences. Use of an equal number of center 
point replications in each block leads to all estimated factor (and block) effect coefficients 
being uncorrelated. 


1244 Part Six Specialized Study Designs 


TABLE 29.9 
Abbreviated 
ANOVA 
Table— 
Steichen 
Bakeries 
Example. 


Source of Variation df 
1 
1 
1 
1 
1 
X 12 1 
1 
1 
1 
1 
1 
1 
1 
3 


Blocks (confounded 
with X23, X124, X134 gis 
Error : 


Total 


ut 
alo 


29.6 Robust Product and Process Design 


In recent years, the importance of reducing variation in products and processes has been 
widely recognized. Uncontrolled variation leads to waste, disruption, duplication of ef- 
fort, decreased consumer satisfaction, and/or the need for inspection and rework. Thus, 
experimental studies are often designed to identify process or product designs that exhibit 
low levels of variation. Such product designs are called robust, because they produce a 
desired result in a consistent, repeatable fashion. The basic framework for using designed 
experimentation to develop robust products and processes was popularized by Dr. Genichi 
Taguchi, a Japanese quality consultant, in the 1980s. It is sometimes referred to generally 
as the “Taguchi Method” (Ref. 29.3). 

For instance, in the manufacture of color television sets, an important performance 
characteristic or outcome measurement is the color density. We will assume that there is a 
best, or target, color density Т. Ideally, all televisions would be produced with color density 
T. However, due to natural variations in materials, equipment, operators, or other aspects of 
the manufacturing process, the actual color densities Y will deviate from the target. While 
any television with a color density within +5 units of T was considered acceptable, the 
manufacturer found that arry deviation from target decreased cüstomer satisfaction. For this 
reason the manufacturer concluded that manufacturing televisions within specification was 
not sufficient. Customer satisfaction would be maximized if the absolute deviations from 
actual color density to target, Dev = |У — T|, or the squared deviations Dev? = |Y — TI’, 
were consistently small. 

Taguchi observed that the average squared deviation from target is given by the mean 
squared error: 


E{Dev’} = ERY — TY) (29.37) 


FIGURE 29.12 
Process 
Distributions 
Before and 
After Product 
Design 
Experiment— 
Color 
Television 
Example. 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1245 


After 
Improvement: 
pB-7T 

о? = 


1 
9 


Before 
Improvement: 


YS ш=Т+ 2 
A 2 


T-5 T T+2 T+5 
Lower Specification Upper Specification 
Limit Limit 


We encountered the mean squared error in Chapter 9 in connection with Mallow’s C, and 
again in Chapter 10 in connection with ridge regression. It can be shown [as we did earlier 
in (9.6)] that the mean squared error can be written as a sum of the variance of Y and the 
square of the off-target distance or bias, (u — Т)2: 


E{Y — TY = оү) + (E(Y] - Т)? = о? + (и – Т)? 


29.38 
= Variance + (Off-Target Distance)? ( ) 


Figure 29.12 showstwo process distributions fortelevision color density. The distribution 
on the right is the process distribution of color density before a robust product design 
experiment was performed. In this case, color density, Y , follows a normal distribution with 
mean u = T + 2 and variance o? = 1. The distribution on the left shows the process 
distribution following the experiment. Here, color density follows a normal distribution 
with mean u = T and variance o? = 1/9. Note that both distributions fall largely within 
the product specification limits T +5; however, prior to experimentation, the mean squared 
error Was: 


E(DeV) = о? + (y – Т)? 214-22—5 
After the product design experiment, the mean squared error was reduced to: 
1 1 
E(DeV) = о? + (uy — T = are x 


Thus on average, the color densities of television sets for the robust design are much closer 
to target than those based on the previous design. 


1246 Part Six Specialized Study Designs 


The implication of (29.38) for designed experimentation is as follows. In any test of al- 
ternative process or product designs, a "best" treatment combination will lead to a treatment 
mean that is close to target with minimal variance. Experiments are therefore conducted in 
such a way that two linear statistical models—one for the mean and one for the variance of 
the response—can be estimated. These estimated models are then used to identify robust 
factor-level settings—those that lead to a process mean p that is close to target, with smal] 
process variance о?. 

In this section, we first introduce a strategy for developing models for both the mean 
and the variance of the response. We then consider the use of special nuisance factors, 
called noise factors, in the construction of robust product design experiments. Noise factors 
are used to develop products and processes that are robust to specific, known sources of 
variation. 


Location and Dispersion Modeling 
As already noted, in robust product design experiment, a “best” factor-level combination 
leads to a response distribution with a small variance and a mean that is close to target. We 
shall assume that k-factor model (29.2a) is applicable, except that we will no longer assume 
that the error variance is constant. In addition, because one of our objectives is to model 
the variance response, we will assume that n > | complete replicates of the experiment 
have been conducted. Let Y;; denote the response of the jth replicate for the ith treatment 


combination, fori = 1,...,r and j = l,...,n. Our model is now: 
Ү = BoXio + Bi Xin + +++ + б, Хи + Ё2Хдә + +++ Bizk Xinek + i; (29.39) 
where: 


Хх, —1 if case į from first level of factor I 
EE 1 if casei from second level of factor l 


Xin = XipXi Хь 


and e;; are independent N (0, of). 
Denote the sample variance obtained for the ith treatment combination by s?: 


2 = ——— ТЕТУ .40 
8 = 0 - Yo (29.40) 


The sample variance is the response to be modeled in the dispersion model. The raw 
responses, Y;; are modeled directly using (29.39). We refer here to (29.39) as the location 
model because it provides for estimates of the mean response as a function of the control- 
factor-level settings. We now consider the development of these models, beginning with the 
dispersion model. 


Dispersion Model. The dispersion model is based on (29.39), where the response Y; is 
replaced by the logarithm of the ith sample variance. We also attach the superscript D to 
the regression parameters and to the error terms as a reminder that these quantities pertain 
only to the dispersion model: 


log, 52 = BP Xio + BP Xa + + BP Xu + ВОХ +o BE Xia FEY 
(29.41) 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1247 


The regression parameters BP, are referred to as the dispersion effects. The reason that 
we use the log, s? as the response rather than s? is that the latter do not follow normal 
distribution with constant variance. Since the &;; are normally distributed with zero mean 
and variance o? it follows from (A.70) that (n — 1)s2/o? is distributed as x? with (n — 1) 
degrees of freedom. It can be shown that log, s? is approximately normally distributed with 
mean log, o? and constant variance 2/(n — 1) (see, e.g., Reference 29.4). Thus, the єр are 
approximately independent and normally distributed with constant variance. Model (29.41) 
can then be estimated using ordinary least squares and the methods discussed in Section 29.2 
for the analysis of unreplicated two-level studies. 


Location Model. The location model is given by (29.39). However, because the variance 
is not constant, the parameters are most efficiently estimated using weighted least squares 
as described in Section 11.1. Specifically, we obtain an estimate of the variance for each 
factor-level combination using: 


à = exp(log,s?) (29.42) 


where log,s? is obtained from the estimated dispersion model (29.41). Then the weights 
are given by (11.16b) on page 425: 


ш, = 2 (29.43) 
Ui 
Alternatively, an approximate analysis can be conducted based on ordinary least squares. 


Strategy for Analysis. We suggest the following strategy for analyzing the location and 
dispersion models: 


1. Fit dispersion model (29.41) and determine whether or not dispersion effects are present. 
This can be done using methods discussed in Section 29.2 for the analysis of unreplicated 
two-level factorials, or the Breusch-Pagan test (3.11) for constancy of error variance. 

2. If the variance is constant, there is no need to fit the dispersion model (29.41). The 
location model (29.39) can then be analyzed using ordinary least squares and the methods 
described in previous sections. 

3. If dispersion effects are present, fit location model (29.39) using weighted least squares, 
or conduct an approximate unweighted analysis. 

4. Use the resulting models based on the active location and dispersion effects to identify 
factor-level combinations that move the predicted mean close to target while minimizing 
the predicted variance. If no dispersion effects are present, only the location model is 
employed. Similarly, if no location effects are present, only the dispersion model 1s 
employed. 


In Step 4, if a factor is active—either through its main effect or through interactions 
involving the factor—in only one of the two models, the selection of optimal level setting 
can be conducted according to the model in which the factor is active. If a factor is active in 
both models, it might not be possible to find a factor-level combination that simultaneously 
produces an optimal mean and an optimal variance. In this case, a compromise setting 15 
identified that leads to “good” (but not necessarily optimal) results for both the mean and 
the variance. 


1248 PartSix Specialized Study Designs 


Example 


We illustrate the use of the modeling strategy with an adaptation of an example due to 
Taguchi. 


A food company investigated alternative recipes for a type of caramel. The performance 
characteristic of interest was the plasticity of the caramel. When subjected to sufficient 
shearing stress, any given caramel will be deformed. If, after the stress is removed, there is 
no recovery, the caramel is completely plastic. On the other hand, if recovery is complete 
and instantaneous, the caramel is completely elastic. A proper balance between these two 
factors is required. In the experiment, the plasticity was measured on a scale of 1 to 100, 
where 100 implies the complete plasticity. The target value of the caramel was 70. 

Three ingredients were thought to be potentially important: brown sugar (X 1), sweetened 
condensed milk ( X5), and light corn syrup (Хз). The first three columns in Table 29.10 fist 
the coded treatment combinations for the 2? full factorial design in standard order, and 
columns 4 through 7 provide the the levels of the interaction columns X12, X13, Xx, and 
X оз. Four replicates of the experiment were obtained, and the four Y;; responses for each 
treatment combination are listed in columns 8-1 1. Also listed in Table 29.10 in columns 12 
and 13 are the sample variances s? and their logarithms log, s?. 

The first step in the analysis was to fit dispersion model (29.41). Results obtained from 
a regression of column 13 in Table 29.10 on columns 1-7 are shown in Figure 29.13a. 
Since there are no replicates for dispersion model (29.41), t-values and P-values cannot 
be obtained. Figures 29.13b provides a normal probability plot of the estimated dispersion 
effect coefficients. The plot clearly suggests the presence of one nonzero dispersion effect, 
namely P. Ignoring inactive effects, the estimated dispersion model is: 


log, s? = 4.0098 + .5748Xi13 (29.44) 


Since dispersion effects are present, we move to Step 3 of the strategy for analysis, which 
calls for the use of weighted least squares (or an approximate analysis using ordinary least 
squares) to estimate the parameters in the location effects model. We will illustrate the use 
of weighted least squares using an estimated variance function, as describedin Section 11.1. 

A model-based estimate of the variance for the ith treatment combination is, from (29.44): 


$; = exp(log, s? ) 
= exp(4.0098 + .5748X;13) 


TABLE 29.10 Experimental Design Matrix and Y Observations—Caramel Example. 


—. 


ONAWAWNS 


(1) 


(2) 


з) (0 © © (0 (8 (9) (0) (п) (12) аз) (14) 


Design Matrix Replicates 
Хз X2 Хз Хз Хз Yn Yo Yr Yia 5? log, 52 Wi 


-1 1 1 1 —1 42 65 70 73 19767 5.287 .0102 
-1 -1 -1 1 1 50 52 55 63 *3267 3.486 .0322 
-1 —1 1 -1 1 61 70 78 79 70.00 4.248 .0102 
-1 1 -1 -1 —1 48 51 55 60 27.00 3.296  .0322 
1 1 -1 -1 1 65 74 74 77 27.00 3.296 .0322 
1 -1 1 -i -1 40 59 63 66 13667 4.918 .0102 
1 -1 -1 1 —1 70 72 77 84 38.92 3.662 .0322 
1 1 1 1 1 48 49 56 63 48.67 3.885 .0102 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1249 


FIGURE 29.13 MINITAB Regression Output and Normal Probability Plot of Estimated 
Effect Coefficients for Dispersion Mode|—Caramel Example. 


(a) Regression Coefficients 


(b) Normal Probability Plot 


Term Effect Coef 

Constant 4.009B Е 
Xi —0.2270 —0.1135 S 
x2 —0.4740 | —0.2370 E 
хз —0.1390 —0.0695 e 
х1*х2 -0.1375 -0.06877 8 
X1*X3 1.1495 0.5748 = 
X2*X3 0.1405 0.0702 я 
X1*X2*X3 0.5620 0.2810 


Expected Value 


FIGURE 29.14 МІМІТАВ Regression Output and Normal Probability Plot of Estimated Effect Coefficients for 
Location Model—Caramel Example. 


Predictor 
Constant 
х1 

х2 

x3 

x12 

x13 

x23 

x123 


S = 1.041 


Analysis of Variance 


Source 
Regression 
Residual Error 
Total 


(a) Regression Output (b) Normal Probability Plot 


Coef SE Coef T P 

62.781 1.478 42.48 0.000 

-7.906 1.478 -5.35 0.000 
1.031 1.478 0.70 0.492 = 
2.031 1.478 1.37 0.182 E 
-2.156 1.478 —1.46 0.158 E 
—1.406 1.478 —0.95 0.351 & 
—0.969 1.478 —0.66 0.518 E 
0.594 1.478 0.40 0.691 s 
R-Sq = 69.996 R-Sq(adj) = 61.196 я 


ОЕ $$ MS F P -1 0 1 
7 60.341 8.620 7.96 0.000 Expected Value 

24 25.993 1.083 

31 86.334 


From (11.166), the ith estimated weight is ш; = 1/6;. For example, for the first treatment 
combination in Table 29.10, we obtain: 


dı = exp(4.0098 + .5748Х пз) = exp(4.0098 + .5748(1)) = 97.96 


from which we obtain the first weight: шу = 1/97.96 = .0102. 

Use of the estimated weights listed in column 14 of Table 29.10 in a regression of the 
Y;; responses in columns 8—10 on the predictors in columns 1-7 led to the weighted least 
squares location effects estimates summarized in the regression output in Figure 29.14a. 
Note that the P-value for b, isO+, while the P-values for the remaining effects are all greater 
than 0.1. The normal probability plot of the estimated location effects in Figure 29.14b also 
suggests that f, is nonzero. Using weighted least squares to estimate the reduced location 


1250 Part Six Specialized Study Designs 


model, we obtain (output not shown): 


With the estimated dispersion and location models in hand, we now turn to Step 4 in the 
strategy for analysis—the identification of robust factor-level combinations. From (29.44), 
two possible optimal settings that minimize the dispersion effect are (X,, Хз) = (+1, ~1) 
and (Х|, Хз) = (—1, +1). However, the result from the location model in (29.45) shows 
that, in order to move the estimated mean response to T = 70, the optimal setting for X, 
is —1. Thus, the optimal setting in the caramel example is: (X,, Хз) = (—1, +1). These 
settings lead to the following estimated mean and variance of caramel plasticity: 


Ў, = 63.511 — 8.960(—1) = 72.5 
log, s? = 4.0098 + .5748(—1)(4-1) = 3.435 


Thus the estimated mean has been moved to within 2.5 of the target T = 70. The estimated 
variance for this setting is exp(3.435)= 31.03. 


Comments 


1. Іп some cases, there are factors that are active only in the location model and notin the dispersion 
model. These factors are called adjustment factors. A common strategy is to select optimal settings 
according to the dispersion model, and then use the adjustment factors to bring the location to the 
target. Of course, there is no guarantee that adjustment factors exist. 

2. The location model can be classified into three groups with respect to the target value: the- 
smaller-the-better, the-larger-the-better, and the-nominal-the-better. For instance, an automotive com- 
pany conducted an experiment to study the effect of four factors on the braking distance in different 
driving conditions. Since the braking distance should be minimized, it is an example of the-smaller- 
the-better case. In another study, the response was the pull strength of truck seat belts following a 
crimping operation. The pull strength needs to be maximized to ensure that the seat belt does not 
break in an accident. Thus, it is an example of the-larger-the-better case. The procedures of the anal- 
ysis in these two cases are the same as those shown in the caramel example, which is the-nominal- 
the-better. 

3. The approach to weighted least squares described here for fitting the location model used a 
model-based estimate of the variance, $,, to obtain weights. A simple alternative is to use the sample 
variances 52, in which case the weights аге ш; = 1/52. This approach is discussed in Section 11.1. 

L| 


` 


Incorporating Noise Factors 


As we have seen, dual-response modeling can be a powerful tool for identifying product or 
process designs that have low levels of variation. Recall from our discussion of blocking 
that variation is often caused by changes in background nuisance factors that cannot be 
controlled. In the caramel example, plasticity is affected by the ambient temperature. If 
the temperature changes during the course of the experimént, this would likely contribute 
to the variation in plasticity observed for each factor-level combination. In a manufactur- 
ing process control experiment, if different operators are responsible for different parts of 
the experiment, they may contribute to variation in the quality of the parts or products 
produced. In robust product design experiments, the investigator often is interested in re- 
ducing variation attributable to one or more specific nuisance factors. In simple terms, this is 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1251 


accomplished by deliberately changing the levels of the nuisance factors during the course 
of the experiment, and then identifying settings of the experimental factors for which the 
response is relatively unaffected by changes to the nuisance factors. 

In robust product design terminology, a nuisance factor that is deliberately varied during 
the experiment is called a noise factor. The standard (non-nuisance) experimental factors are 
termed control factors. Generally, control factors are variables that are easy or inexpensive 
to control in the design of the product or process. Noise factors are variables that are hard 
or expensive to control during manufacturing or during product use. 

Consider again the caramel example. Suppose that the investigator was concerned specif- 
ically with the effect of temperature on the plasticity of the product when used by the con- 
sumer. Suppose also that the investigator was interested in four temperature levels, namely, 
60°F, 70°F, 80°F, and 90°F. In this case, temperature would simply be added as a fourth 
(four-level) factor (X4) in the experiment. The three control factors would be brown sugar 
(X4), sweetened condensed milk (X2), and light corn syrup (X3). In the experiment, each 
factor-level combination of the control factors (X;, Хә, Хз) is tested at the four levels of 
temperature. This is accomplished by crossing the levels of the control factors with the lev- 
els of the noise factors, leading here to the use of a 2? x 4 full factorial design. For purposes 
of analysis, the four responses obtained for each combination of the control factors are 
treated simply as replicates, and a dual-response analysis, as already described, is carried 
out. Control factor settings that lead to a small variance s? are unaffected by—and therefore 
robust to—the changes to levels of the noise factor. 

Noise factors can arise during the manufacturing process or when the product is in 
use. Internal noise refers to variations that occur during the production process. Examples 
include raw material variation, manufacturing variation, unit-to-unit variation, and so on. 
Making product performance insensitive to these variations can improve the quality of the 
product while lowering the cost of production. 

External noise refers to variations that occur when the product is used by the cus- 
tomer. Examples include the environment in which a product works, the load to which it 
is subjected, and natural deterioration. For instance, a reliable automobile should perform 
consistently whether it is used in Florida in the summer or Minnesota in the winter. А good 
washer should be robust to the laundry load. Making product performance insensitive to 
external variations will improve the reliability of the product and increase the customer 
satisfaction. 

In summary, the basic procedure for incorporating noise factors into a robust product 
design experiment is as follows. 


1. Identify the experimental layout for the control factors. This may be a full factorial or a 
fractional factorial, blocked or unblocked, depending on the experimenter's objectives, 
as discussed in Sections 29.1—29.6. 

2. Identify the noise factors and associated noise-factor levels to be included in the exper- 
iment. If there is more than one noise factor, identify the factor-level combinations of 
the noise factors to be included. Generally these are obtained from a full factorial layout 
among the noise factors. However, fractional factorial arrangements of the noise factors 
are sometimes employed if many noise factors are present. 

3. The full experimental design is obtained by crossing the control-factor-level combi- 
nations with the noise-factor-level combinations. As always, the resulting treatment 


1252 Part Six Specialized Study Designs 


TABLE 29.11 
Layout of the 
Experimental 
Design with 
Noise Factor— 
Caramel 
Example. 


Case Study—Clutch Slave Cylinder Experiment 


Noise Factor 
Run X, Хә Хз 60°F 70°F 80°F 90F s (од, 52 


1 -1-1-1 42 65 70 73 197.67 5.287 
2 1 -1 —1 50 52 55 63 32.67 3.486 
з 1 1 —1 61 70 78 79 70.00 4.248 
4 1 1 -1 48 51 55 60 27.00 3.296 
5 -—1 -1 1 65 74 74 77 27.00 3.296 
6 1 -1 1 40 59 63 66 136.67 4.918 
7 -1 1 1 70 72 77 84 38.92 3.662 
8 1 1 1 48 49 56 63 48.67 3.885 


combinations are randomly assigned to the experimental units. Note that if there are 
n, control-factor-level combinations and there are п, noise-factor-level combinations, 
there will be nz, treatment combinations in all. 

4. Theanalysisis conducted using the dual-response-optimization strategy outlined on page 
1247. 'The n, responses obtained for each control-factor-level combination are treated 
as replicates. 


We illustrate the use of a single noise factor by continuing our discussion of the caramel 
example. We then move on to a more extensive case study from the automotive industry, 
which employed five control factors and two noise factors. 


Caramel Example. In the caramel example, the four responses at a given control-factor- 
level combination were actually obtained at the four temperatures: 60°F, 70°F, 80°F, and 
90°F. Note that we cannot control the temperature in the field, but by controlling it during 
the experiment, we can identify the settings of the control factors that lead to the desired 
plasticity across all levels of temperature—that is, with small variance. 

The layout of the experimental design matrix is shown in Table 29.11. This is essentially 
the same design layout as the one shown in Table 29.10. The only difference is that the 
replications of each control-factor-level combination are conducted deliberately at different 
levels of the temperature. The steps in the analysis are identical to those shown previously, 
leading to (29.44) for the dispersion model, and (29.45) for the location model. Thus the 
setting with brown sugar at the low level (X, = —1) and light corn syrup at the high level 
(Хз = 1) leads to a product that has the desired mean plasticity and is relatively unaffected 
by or robust to changes in temperature. 

We now turn to a discussion of a robust product design experiment from the automotive 
industry. 


A research project in a major automotive company was conducted to develop a design for a 
clutch slave cylinder that would minimize fluid leakage. Five two-level control factors and 
two two-level noise factors were identified. The five control factors are body inner diameter 
(X), body outer diameter (X2), seal inner diameter (Хз), seal outer diameter (Хд), and seal 
design (X5 = —1: lip seal; X5 = 1: quads seal). Two noise factors are: temperature (X6) and 
load (X; = —1: light; X; = 1: heavy). The response is leakage, which is to be minimized. 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1253 


TABLE 29.12 Experimental Design and Responses—Clutch Slave Cylinder Example. 


со моо ьм ™ 


(1) 


2 (3) (6 (6) (7) (8) (9) (09 (1) 
Control Factors Noise Factors (X6, X7) 
Хә Хз Ха Xs (-1,-1) (+1,—1) (-1,41) (+1, +1) loge w 
—-1 -1 ~1 -1 8 ^4 0 0 —1.920 1.245 
-1 -1 1 -1 3.2 0 0 0 .940 195 
1 -1 1 -1 0 0 0 2.4 .365 1.245 
1 +1 -1 -1 5.8 0 0 2.8 2.036 .195 
—1 1 1 -1 0 3.0 0 2.4 912 1.245 
—1 1 -1 -1 0 1.2 0 4.0 1.270 ‚195 
1 1 -1 -4 0 2.6 0 1.2 .425 1.245 
1 1 1 -1 1.0 2.3 5.2 0 1.627 .195 
—1 -1 -1 1 9.8 2.5 13.8 2.0 3.500 .009 
—1 —1 1 1 6.4 3.0 13.0 0 3.440 .058 
1 -1. 1 1 8.8 2.0 31.0 E! 5.294 .009 
1 -1 -1 1 1.8 3.4 6.9 0 2.152 .058 
—1 1 1 1 6.8 2.4 26.4 0 4.970 „009 
—1 1 —1. 1 4.0 2.2 12.6 3.4 3.120 .058 
1 1 -1 1 10.2 1.8 38.8 3.2 5.697 .009 
1 1 1 1 7.8 1.4 6.4 5.6 2.026 .058 


The experimental plan is shown in Table 29.12. For the five control factors, a resolution IV 
design was used, in which the defining relation 15 0 = 1234. For each control factor setting, 
four responses were obtained, corresponding to the 2? — 4 noise-factor-level settings. When 
the control-factor-level combinations are crossed with the noise-factor-level combinations 
we obtain a 257! x 22 robust product design experiment. 

Again following the dual-response modeling strategy on page 1247, we first estimate 
the dispersion model. Because the design in the control factors is a resolution IV fractional 
factorial design based on the defining relation 0 = 1234, the following dispersion effects 
are confounded: 


В? + Ва ВР + Pra P? + Ваа ВУ + Ва 
ВЪВ BP + Ale PRAA BBH зад 
Вы + B5 Bis + Виз Bis + Brus B35 + Bizas 
Bas + Biss Bizs + Bags Bi3s + Baas Bias + Bras 


We will form dispersion model (29.41) here by choosing the first dispersion effect from 
each of the 16 pairs in (29.46): 


log, 52 = By + Br Хи + --- + ВО; Ха + єр (29.47) 


Regressing the log, s? values in column 10 of Table 29.12 on the predictors indicated 
by (29.47), we obtain the estimated dispersion effects shown in Figure 29.15a. A normal 
probability plot of the estimated dispersion effects is shown in Figure 29.15b. It can be 
seen that the main dispersion effect of factor X5 and two-factor interaction X45 appear to 


1254 PartSix Specialized Study Designs 


FIGURE 29.15 MINITAB Regression Output and Normal Probability Plot of Estimated Effect Coefficients for 
Dispersion Model—Clutch Slave Cylinder Example. 


(a) Regression Output (b) Normal Probability Plot of the Effects 

Term Effect Coef 

Constant 2.2409 

X1 —0.3290 | —0.1645 

x2 0.4237 0.2119 

x3 0.5300 0.2650 Fe 

х4 0.4117 0.2059 £ 

Х5 3.0680 1.5340 P: 

X1*X2 —0.6560 ^ —0.3280 © 

X1*X3 -0.6612  -03300 8 

X1*X4 —0.5480 —0.2740 2 

X1*X5 —1.8517  -0.9259 

X2*X5 —0.3890 —0.1945 

X3*X5 0.1733 0.0866 

X4*X5 —0.0965 ^ —0.0482 

X1*X2*X5 | —0.5698 —0.2849 

X1*X3*X5 0.0815 0.0407 Expected Value 
X1*X4*X5 0.3298 0.1649 


be active. Eliminating the inactive effects leads to the estimated subset dispersion model: 
log, s? = 2.241 + 1.534Х5 — .926X;is (29.48) 


from which we obtain the model-based variance estimates: 0; = exp(log, 52). 

Since significant dispersion effects are present, we turn now to the estimation of the 
location model using weighted least squares. The estimated weights, which as before are 
the inverses of the estimated variances in (29.43), are shown in column 11 of Table 29.12. 
Use of these estimated weights in a regression of the Y;; responses in columns 6-9 on the 
predictors indicated in (29.47) leads to the weighted least squares location effects estimates 
summarized in the regression output in Figure 29.16a. The output indicates that only one 
estimated location effect, bs is significant at the о = .05 level of significance. The normal 
probability plot of the estimated location effects in Figure 29.16b also clearly suggests that 
P5 is the only active location effect. Using weighted least squares to estimate the reduced 
location model, we obtain: 


Y, = 3.232 + 2.235Х;5 (29.49) 


We now turn to Step 4 in the analysis strategy—the identification of robust control- 
factor-level combinations. Note that factor X5 enters both the dispersion model (29.48) and 
location model (29.49) as a main effect with a positive coefficient. The predicted dispersion 
and location are both to be minimized in this example; we therefore set X5 — —1. For 
dispersion model (29.48), X5 also enters through the interaction term X, Xs. Since the 
estimated dispersion interaction effect is bj; = —.926, minimization is accomplished by 
setüng X, X5 — 1. With X5 — —1 from the location model, we have X,(—1) — 1, implying 
X, — —1. These settings lead to predicted mean fluid leakage: 


Y = 3.232 + 2.235(—1) = .997 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1255 


FIGURE 29.16  MINITAB Regression Output and Normal Probability Plot of Estimated Effect Coefficients for 
Location Model —Clutch Slave Cylinder Example. 


Predíctor 
Constant 


X1 
x2 
ХЗ 
X4 


(a) Regression Results (b) Normal Probability Piot 


Coef SE Coef T P 
4.3141 0.8256 5.23 0.000 
—1.0828 0.8256 —1.31 0.196 
0.4609 0.8256 0.56 0.579 
0.5578 0.8256 0.68 0.502 
0.0891 0.8256 0.11 0.915 
3.1172 0.8256 3.78 0.000 
—0.5422 0.8256 —0.66 0.514 
—0.2203 0.8256 | —0.27 0.791 
0.1359 0.8256 0.16 0.870 
—1.4797 0.8256  —1.79 0.079 
0.2016 0.8256 0.24 0.808 
0.3234 0.8256 0.39 0.697 
0.0672 0.8256 0.08 0.935 

— 0.8266 0.8256 —1.00 0.322 
—0.1047 0.8256 —0.13 0.900 
0.2891 0.8256 0.35 0.728 


Location Effects 


—2 -1 0 1 2 
Expected Value 


with predicted variance: 
9 = exp[2.241 + 1.534(—1) — .926(—1)(—1)] = -803 


Note that prediction intervals for these quantities can be obtained in the usual way. Often, a 
confirmation test is carried out at the suggested factor-level combination as a check on the 
validity of the model. The model is said to be confirmed if the results of the confirmation 
run fall within the calculated prediction limits. 


Comments 


1. An alternative approach to the dual-response optimization approach discussed here, called 
the response modeling approach, was proposed by Welch et al. (Ref. 29.5) and Shoemaker et al. 
(Ref. 29.6). This approach advocates, as a first step, the usual analysis of the experiment, making no 
distinction between noise and control factors, If significant interactions exist that involve both noise and 
control factors, these interactions are analyzed through graphical or other means to determine which 
control-factor-level combinations lead to the desired mean responses and are relatively unaffected by 
changes to the noise factors. 

2. In the framework proposed by Taguchi, the analysis of a robust design model involves the 
signal-to-noise ratio, which is a transformation based on Y;. and s? (Ref. 29.3). Since then, many 
other analysis methods have been proposed, but the location-dispersion modeling and the response- 
modeling approaches are often preferred by statisticians. For a more detailed discussion, see Refer- 
ence 29.4. 

3. The control factor layout chosen by the engineer in the clutch slave cylinder example was a 
resolution IV design. Table 29.6 indicates that a design with higher resolution was available, namely 
the 23! design based on the defining relation 0 = 12345. ш 


1256 PartSix Specialized Study Designs 


Cited 


Box, G. E. P., W. G. Hunter, and J. S. Hunter, Statistics for Experimenters. New York: John 
Plackett, R. L., and J. P. Burman. “The Design of Optimum Multifactorial Experiments,” 
Taguchi, G. Introduction to Quality Engineering. Tokyo, Japan: Asian Productivity Organiza. 
Wu, C. F. J., and M. Hamada. Experiments: Planning, Analysis, and Parameter Design Opti- 
Welch, W. J., T. K. Yu, S. M. Kang, and J. Sacks. “Computer Experiments for Quality Control 


by Parameter Designs,” Journal of Quality Technology 40 (1990), pp. 62-71. 
Shoemaker, A. C., K. L. Tsui, and C. F. J. Wu. *Economical Experimentation Methods for 


A plant manager used a 2^ factorial design with two replicates for each treatment to study 
the effects of four process variables (X;,..., X4) on product quality (Y). State the response 
model in the form of (29.22). How many two-factor interaction terms are there? How many 
three-factor interaction terms? How many four-factor interaction terms? 


29.1. 
References Wiley & Sons, 1978. 
. 29.2. 
Biometrika 33 (1946), pp. 305-25. 
29.3. 
tion, 1986. 
29.4. 
mization. New York: John Wiley & Sons, 2000. 
29.5. 
29.6. 
Robust Design,” Technometrics 33 (1991), pp. 415-27. 
Problems 291. 
29.2. 


A scientist observed: “Two-level factorial designs are useful if the number of factors is small. 
But I am concerned when there are 10 or more factors; the number of trials required for a 2! 
experiment is simply too large." Discuss. 


*29.3. Reaction yield. A chemical engineer decided to employ a single replicate of a 2° factorial 


29.4. 


design to study the effects of the process variables on the yield of a chemical reaction. 


а. How may factors are involved? How many levels are there for each factor? How many 
experimental trials will be required for the single replicate of the experiment? 


b. Cana test for lack of fit be obtained here? 


A biologist considered studying the effects of various environmental pollutants on the health 
of mice by using a 2’~* fractional factorial design. 


a. How many factors are involved? How many levels are there for each factor? How many 
trials will be required for a single replicate of the experiment? Can a test for lack of fit be 
obtained? 

b. The biologist decided to augment the design with six center-point replicates. Can a test 
for lack of fit now be obtained? If so, can the biologist determine which factors caused a 

* curvature effect? 


29.5. State the X matrix (including all main effects and interaction columns) for a single replicate 


of a 2? factorial design, with the rows listed in standard order. Show numerically that (29.3) 
holds for your X matrix. 


*29.6. Refer to Reaction yield Problem 29.3. Past experience indicates that the standard deviation 


of reaction yield is o = 5. 
a. Find the variance of the estimated main effect coefficient b,. Is the variance of the interaction 
effect coefficient bız the same? Should it be? 


b. How many replicates of the experiment are required in order to estimate factor effect 
coefficent b, within +.5 with 95 percent confidence? 


*29.7. Pilot training. An unreplicated 2° full factorial design was used to investigate the effects of 


five factors on the learning rates of flight trainees when using flight simulators. The factors 
were display type (X, = —1: symbolic; Ху = 1: pictorial), display orientation (X? = i: 


*29.8. 


29.9. 


Chapter 29  Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1257 


outside in; X2 = 1: inside out), crosswind (Хз = —1: no wind present; Хз = 1: crosswind 
present), command guidance (X4, = —1: constant guidance; Х = 1: guidance only when 
trainee strays far from best flight path), and flight path prediction (X5 — —1: no prediction; 
X5 = 1: constant prediction). The response Y is the average squared distance from the optimal 
flight path for 12 landing attempts by the trainee. The smaller is Y, the better is the trainee's 
performance. Thirty-two subjects (trainees) were selected at random from a large group of 
trainees with no prior flying experience. The design matrix for the experiment and the observed 
trainee flight scores (Y) follow. 


Y Ху X2 Xs X4 Xs 
8.69 1 1 1 1 1 
7.71 1 -1 -1 -1 -1 
9.03 -1 1 -1 -1 -1 
6.67 1 -1 1 1 1 
2.78 -1 1 1 1 1 
7.45 1 1 1 1 1 


Adapted in part from L. Lintern et al., "Display Principles, Control 
Dynamics, and Environmental Factors in Pilot Training and Transfer,“ 
Human Factors 32 (1990), pp. 64—69. 


a. State the regression model in the form (29.2a). Fit this model and obtain the estimated 
factor effect coefficients. Does it appear from the magnitudes of the estimated coefficients 
that some factors may be active here? 


b. Prepare a dot plot of the estimated factor effect coefficients. Which effects appear to be 
active? 

с. Obtain a normal probability plot of the estimated factor effect coefficients. Which effects 
appear to be active? Do the estimated factor effects appear to be normally distributed? How 
do your results compare with those in parts (a) and (b)? 


Refer to Pilot Training Problem 29.7. The regression model was revised by dropping all 
three-factor and higher-order interactions. 


а. State the revised regression model. Fit the revised regression model and prepare a plot of 
the residuals against the fitted values. Do the standard regression assumptions appear to be 
satisfied? 

b. Obtain a normal probability plot of the residuals. Also conduct the correlation test for 
normality; use o — .05. Does the assumption of normality appear to be reasonable here? 

c. Using the P-values for the estimated factor effect coefficients, test for the significance of 
each factor effect. Control the family level of significance at о = .05 using the Kimball 
inequality. Which effects appear to be active? 

d. Summarize the results of the experiment with an appropriate set of plots of main effects 
and interactions. Interpret the results. 


Computer monitors. A single replicate of a 2* full factorial design, augmented by three 
replicates at the center point, was used to determine the most reliable design of a computer 
monitor base. Factors of interest were clearance under the base (Х|), interface board height 
(Хз), side vent size (Хз), and interface board angle (X4). АП factors are quantitative and 
are coded with X; — —1 for the low level of the factor and X; — 1 for the high level. The 
response (Y) is the failure rate of the interface board, with lower failure rates representing 
higher product quality. The design matrix for the experiment and the observed design failure 


1258 PartSix Specialized Study Designs 


29.10. 


29.11. 


29.12. 


29.13. 


29.14. 


rates (Y) follow. 


Y Xi X2 X3 X4 
3.88 -1 -1 -1 -1 
3.17 1 -1 -1 -1 
4.07 -1 1 -1 -1 
3.80 0 0 0 0 
3.99 0 0 0 0 
4.16 0 0 0 0 


State the regression model in the form (29.22). Fit this model and obtain the estimated 
factor effect coefficients. Does it appear from the magnitudes of the estimated coefficients 
that some factors may be active here? 

Prepare a dot plot of the estimated factor effect coefficients. Which effects appear to be 
active? 

Obtain a normal probability plot of the estimated factor effect coefficients. Which effects 
appear to be active? Do the estimated factor effects appear to be normally distributed? How 
do your results compare with those in parts (a) and (b)? 


. Obtain MSPE using the three center-point replicates and (29.17). Use this estimate to 


determine the P-value for each estimated factor effect coefficient. Determine which effects 
are active; use о = .05 for each test. 


Refer to Computer monitors Problem 29.9. The regression model was revised by including 
only the main effects of factors 1, 3, and 4 and the 34 interaction. 


a. 


Fit the revised model and prepare a plot of the residuals against the fitted values. Do the 
standard regression assumptions appear to be satisfied? 

Obtain a normal probability plot of the residuals. Also conduct the correlation test for 
normality; use о = .05. Does the assumption of normality appear to be a reasonable one 
here? 


. Using the P-values for the estimated factor effect coefficients, test for the significance of 


each effect; use а = .01 for each test. Which effects are active? 
Conduct a test for lack of fit; use œ = .05. State the decision rule and conclusion. 


e. Summarize the results of the experiment with an appropriate set of plots of main effects 


and interactions. Interpret the results. How should the monitor base be designed to achieve 
a minimum failure rate? 


Refer to the X matrix for a 24 full factorial design in Table 29.2. 


a. 


gp 


Identify the defining relation for the fractional design obtained by dropping treatments 3 
to 6, 9, 10, 15, and 16. What is the resolution of the fractional design so obtained? 


Give the complete confounding scheme for the fractional design obtained in part (a). 


Construct a design for four two-level factors with eight experimental trials that has the 
highest possible resolution. What is the resolution of this design? 

Verify the projection property for the design constructetl in part (a) that any subset of three 
(or fewer) factors yields a full factorial design in those factors. 


Is it possible to construct a resolution III design for four two-level factors with four experi- 
mental trials? If so, construct such a design. If not, indicate why this is not possible. 


Construct a 25! design using the defining relation 0 = 123. Is there an alternative eight-run 
design of higher resolution? 


x29.15. 


29.16. 


29.17. 


*29.18. 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1259 


Obtain the complete defining relation and the confounding scheme for the eight-run, five- 
factor design that is fractionated on the basis of the relation 0 = 123 = 245. What is the 
resolution of this design? Is there an alternative design with higher resolution? 


The following design matrix was used in an eight-run, five-factor experiment: 


X X2 X3 X4 Xs 


1 1 1 1 1 
-1 -1 —1 1 

= 1 —1 1 1 
1 -1 1 —1 

= —1 1 1 1 
-1 1 1 -1 

= 1 1 —1 -1 
1 1 -1 1 


Obtain the defining relation and the complete confounding scheme for this design. What is the 
resolution of this design? Can an alternative five-factor, eight-run design with higher resolution 
be constructed? 

Construct a 29^ fractional factorial design of highest resolution using Table 29.6. What is the 
defining relation for this design? What is its resolution? 

Peanut solids. A food scientist conducted a single replicate of a 27? fractional factorial 
design in an effort to identify factors that affect the extraction of food solids from peanuts 


using water. Factors of interest were the pH level of the water (X, — —1: 6.95; X, — 1: 8.00), 
water temperature (X; = —1: 20°C; X; = 1: 60°C), extraction time (Хз = —1: 15 minutes; 
Хз = 1:40 minutes), water-to-peanuts ratio (X4 = —1:5; X4 = 1:9), agitation speed 


(X5 = —1: 5,000 rpm; X5 = 1: 10,000 rpm), hydrolysis (X6 = —1: unhydrolyzed; X6 = 1: 
hydrolyzed), and presoaking level (X; = —1: dry; X; = 1: soaked). The experimental units 
were 16 randomly selected batches of peanuts. The response (У) is the percentage of the total 
solids removed from each batch. The defining relation used to construct the 27? fractional 
design (excluding generalized interactions) is 0 = 1235 — 2346 = 1247. The design matrix 
for the experiment and the observed percentage extractions (Y) follow. 


Y X1 X2 Хз Ха Xs Xe X; 
10.82 1 1 1 1 1 1 
10.59 1 — -1 —1 1 = 1 

8.19 —1 1 -1 —1 1 1 
5.12 1 —1 1 1 —1 —1 —1 
5.60 —1 1 1 1 —1 1 —1 
5.73 1 1 1 1 1 1 1 


Adapted from 1. Y. S. Rustorn et al., “A Study of Factors Affecting 
Extraction of Peanut (Arachis hypogaea L.) Solids with Water," Food 
Chemistry 42 (1991), pp. 153-65. 


a. Obtain the generalized interactions and the complete defining relation. What is the resolu- 
tion of the design? Could a design of higher resolution have been used here? 

b. Using the defining relation in part (а), determine the confounding pattern for all main 
effects and two-factor interactions. 


1260 PartSix Specialized Study Designs 


x29.19. 


29.20. 


c. State the regression model in the form (29.22). Remember that confounded effects must not 
be included in your model. Fit this model and obtain the estimated factor effect Coefficients, 
Prepare a dot plot of the estimated factor effect coefficients. Which effects appear to be 
active? 

d. Obtain a normal probability plot of the estimated factor effect coefficients, Which effects 
appear to be active? Do the estimated effects appear to be normally distributed? How do 
your results compare with those in part (c)? 

е. Test whether all two-factor interaction effects can be dropped from the model; use := ‚01. 
State the alternatives, decision rule, and conclusion. 


Refer to Peanut Solids Problem 29.18. The regression model was revised by dropping all 
interaction effects. 


a. Fitthe revised model and prepare a plot of the residuals against the fitted values. Do the 
standard regression assumptions appear to be satisfied? 

b. Cases 3 and 14 have fairly large absolute residuals. Conduct the Bonferroni outlier test for 
each of these cases; use œ = .05 for each test. What do you conclude? 

c. Obtain a normal probability plot of the residuals. Also conduct the correlation test for 
normality; use о = .025. Does the assumption of normality appear to be reasonable here? 


d. Using the P-values of the estimated factor effect coefficients, test for the significance of 
each effect; use о = .02 for each test. Which effects are active? 


e. Summarize the results of the experiment with an appropriate set of plots of main effects. 
Interpret the results. How should maximum food solids extraction be achieved? 


Fiber optics. A chemist conducted a screening experiment to identify factors that affect the 
viscosity of a gel used in the manufacture of fiber optic cabling. To minimize the loss of 
telephone signal, the inner glass fibers must be allowed to move freely within the cabling for 
arange of temperatures. A lubricant (gel) is used to promote this movement. The viscosity of 
the gel must be sufficiently low to allow such movement; yet it must not be so low as to lead 
to dripping (leakage) from the ends. A single replicate of a 2? fractional factorial design 
was conducted, The factors of interest were silica particle size (Ху = —1: 200; X, = 1: 380), 
silica weight (X; = —1: low; X2 = 1: high), oil ratio (Хз = —1: low; Хз = 1: high), oil 
temperature (X4 = —1: low; X4 = 1: high), stabilizer level (X5 = —1: low; Х5 = l: high), 
premix time (Хо = —1: short; X; = 1: long), postmix time (X; = —1: short; X; = 1: 
long), postmix vacuum (Xg = —1: по; Xg = 1: yes), and filter mesh size (Xo = —1: 
small; Ху = 1: large). The response of interest is gel viscosity (Y); management feels that an 
optimal (target) gel viscosity is 74.5. The design matrix for the experiment and the observed 
viscosities (У) follow. 


Y X X2 X3 X4 Xs Xe X; Xs Xo 
101.2 1 1 1 1 -1 1 —1 -1 1 
92.9 1 1 1 1 1 1 1 
129.9 1 1 1 1 1 1 1 1 1 
73.4 1 1 1 1 1 1 „—1 —1 =] 
31.6 1 1 —1 —1 -1 -1 1 -1 =l 
121.6 1 —1 1 -1 1 -1 -1 1 1 


Adapted from T. 1. Reed, "Quality Improvement of Silica-Based Polysiloxane Gel Used in Fiber Optic 
Cabling by Process Optimization via Taguchi Methods," Fifth Symposium on Taguchi Methods, Detroit: 
ASI Press (1987), pp. 555-71. 


29.21. 


29.22. 


29.23. 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1261 


a. State the regression model containing only factor main effects in the form (29.2a). 
Fit this model and obtain the estimated factor effect coefficients. Does it appear 
from the magnitudes of the estimated coefficients that some factors may be active 
here? 

b. Prepare a Pareto plot of the estimated factor effect coefficients. Which effects appear to be 
active? 

c. Obtain a normal probability plot of the estimated factor effect coefficients. Which effects 
appear to be active? Do the estimated factor effects appear to be normally distributed? How 
do your results compare with those in part (b)? 

d. Using the P-values of the estimated factor effect coefficients, test for the significance of 
each effect term; use œ = .10 for each test. Which effects are active? 


Refer to Fiber optics Problem 29.20. The regression model was revised to include only the 
main effects for factors 1, 5, and 7. 


а. Fit the revised regression model and prepare a plot of the residuals against the fitted values. 
Do the standard regression assumptions appear to be satisfied? 

b. Obtain à normal probability plot of the residuals. Also conduct the correlation test for 
normality; use œ = .05. Does the assumption of normality appear to be reasonable 
here? 

c. Conduct a lack of fit test for the revised regression model; use a = .05. State the alterna- 
tives, decision rule, and conclusion. What does your conclusion suggest about the possible 
presence of interactions? 


Refer to Fiber optics Problems 29.20 and 29.21. Since the experimental design consists of 
two complete replicates of а 23 factorial in the three active factors 1, 5, and 7, consider now 
a revised model containing the main effects of factors 1, 5, and 7 and all interactions among 
these three factors. 


а. State the revised regression model and fit it. Using the P-values of the estimated factor 
effect coefficients, test for the signficance of each factor effect; use о = .01 for each test. 
Which effects are active? 


b. Obtain a normal probability plot of the residuals. Compare this plot to that obtained in 
Problem 29.21b. What do you conclude? 

c. Summarize the experimental results with an appropriate set of plots of the main effects 
and interactions. Interpret the results. 

d. How might you proceed to determine the levels of factors 1, 5, and 7 so that the expected 
viscosity of the resulting gel would be on target at 74.5? 


Windshield molding manufacture. An experimental study was undertaken in an effort to 
reduce the occurrence of dents in a windshield molding manufacturing process. The dents 
are caused by pieces of metal or plastic that are carried into the dies during stamping and 
forming operations. Four factors were identified for use in an eight-run experiment: poly-film 
thickness—used to protect the metal strip during manufacturing to reduce surface blemishes 
(X; = —1: .00175; X, = 1: .0025), oil mixture ratio for surface lubrication (X; = —1: 
05; X2 = 1: .10), operator glove type (Хз = —1: cotton; X4 = 1: nylon), underside oil 
coating (X4.— —1: no coating; X4 — 1: coating). During each run of the experiment, 1,000 
moldings were fabricated; the response (Y) is the number of defect-free moldings produced. 
The design matrix for the experiment and the observed numbers of defect-free moldings 
produced (Y) follow. 


1262 Part Six Specialized Study Designs 


29.24. 


29.25. 


*29.26. 


Y X X2 X; X4 
338 1 29 =j E 
826 1 21 1 1 
350 1 1 -1 21 
647 1 1 1 1 
917 -1 -1 =1 1 
977 —1 -1 1 = 
953 —1 1 = 1 
972 ES 1 1 2 


Adapted from G. Adel, "Minimize Slugging by Optimizing 
Controllable Factors on Topaz Windshield Molding," Fifth 
Symposium on Taguchi Methods, Detroit; ASI Press (1987), 
pp. 519-26. 


a. Determine the defining relation and the complete confounding scheme used in the experi- 
ment. Could a design of higher resolution have been used? 


b. State the regression model in the form (29.22). Remember that confounded factor effects 


must not be included in your model. Fit this model and obtain the estimated factor effect 
coefficients. 


c. Prepare a dot plot of the estimated factor effect coefficients. Which effects appear to be 
active? 

d. Obtain a normal probability plot of the estimated factor effect coefficients. Which effects 
appear to be active? How do your results compare with those in part (c)? Do the estimated 
factor effects appear to be normally distributed? 


Refer to Wmdshield molding manufacture Problem 29.23. The regression model was revised 
to include only the main effects for the four factors. 


a. Fitthe revised regression model. Using the P-values for the estimated factor effect coeffi- 
cients, test for the significance of each effect; use о = .05 in each case. Which effects are 
active? 


b. Summarize the results of the experiment with an appropriate set of plots of main effects. 
Interpret the results. Identify the settings of the experimental factors within the operating 
range that lead to the maximum number of defect-free moldings. 


Construct a 25,7 design in two blocks of size four such that main effects are not confounded 
with the block effect. 


Team effectiveness. A researcher employed a single replicate of a 29 full factorial design, with 
eight blocks containing eight treatments each, to study the effects of team member's ability 
level and motivation level on the performance of three-person military teams consisting of an 
operator, a loader, and a mover. The factors studied were operator's ability (Х|), operator's 
motivation (Хз), loader's ability (Хз), loader's motivation (X4), mover's ability (X5), and 
mover’s motivation (X6). All factors are quantitative and are coded with X, = —1 referring 
to the low level of the factor and X; = 1 referring to its high level. The 64 teams were formed 
by assigning persons to teams in accordance with the 2° full factorial design. 

The team ratings (Y) were assigned by unit commanders following two months of military 
activity. Becauseunitcommanders could observe at most 10 teams. and because it was expected 
that some scoring biases might result, the teams were assigned to commanders in blocks 
of size eight. Levels of the interaction terms X135, X145, and X245 were used to determine 
the blocks. The observed team ratings, the design matrix, and the blocking arrangement 
follow. 


*29.27. 


*29.28. 


29.20. 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1263 


Y Block X1 X2 Хз X4 Xs Xe 
43 1 1 1 1 1 1 —1 
61 1 1 1 1 1 —1 —1 
60 1 —1 1 1 = -1 
66 8 1 —1 —1 1 -1 1 
64 8 1 1 1 1 1 1 
91 8 1 1 1 1 1 1 


Adapted in part from A. E. Tziner, "Effects of Team Composition on Ranked Team Effectiveness," 
Small Group Behavior 19 (1988), pp. 363-78. 


a. Obtain a scatter plot of team ratings against block number. Does it appear that blocking 
was effective here? 

b. Identify the complete confounding scheme for blocks. Are any main effects confounded 
with blocks? Any two-factor interactions? 

C. State the regression model in the form (29.22). Fit this model and obtain the estimated 
factor effect coefficients. Prepare a dot plot of the estimated factor effect coefficients. 
Which effects appear to be active? 


d. Obtain a normal probability plot of the estimated factor effect coefficients. Which effects 
appear to be active? How do your findings compare with those in part (c)? Do the estimated 
factor effects appear to be normally distributed? 


Refer to Team effectiveness Problem 29.26. The regression model was revised to include 
only the factor main effects, two-factor interactions, and block main effects. 


a. Fit the revised model and prepare a plot of the residuals against the fitted values. Do the 
standard regression assumptions appear to be satisfied? 


b. Obtain a normal probability plot of the residuals. Also conduct the correlation test for 
normality; use œ = .05. Does the assumption of normality appear to be reasonable here? 

c. Usingthe P-values for the estimated factor effect coefficients, test for the significance of 
each factor effect; use œ = .01 for each test. Which effects are active? 


Refer to Team effectiveness Problems 29.26 and 29.27. The finally revised regression model 
consists of all block main effects and all factor main effects only. 


a. Fitthe finally revised regression model. 

b. Summarize the results of the experiment with an appropriate set of plots of the factor main 
effects. Interpret the results. How is maximum team effectiveness achieved? 

c. Obtain a 95 percent prediction interval for the team performance for a single new team 
formed as described in part (b); assume that the rater (block) effect is zero in making your 
prediction. 


Whipped topping. Food scientists had developed a prototype soybean-based whipped top- 
ping, butthe product suffered in that the volume of the whipped product did not meet expecta- 
tions. In an effort to maximize the topping volume, a 25! fractional factorial design of highest 
resolution was used in an experiment in two blocks of size eight each, with three center-point 
replicates in each block. The design confounded the block effect with the 45 interaction. The 
factors studied were soybean solids level (Х|), fat level (X5), emulsifier level (X3), and the 
levels of two stabilizers: methocel (X4), and avicel (X5). All factors are quantitative and 
are coded with X; — —1 referring to the low level of the factor and X; — 1 referring to its 
high level. The response (Y) is the percent increase in volume of the product due to whipping; 
large increases are desirable. The observed responses, the design matrix, and the blocking 


1264 Part Six, Specialized Study Designs 


29.30. 


29.31. 


arrangement follow. 


Y Block X X2 Хз X4 Xs 
124 1 1 1 1 1 1 
144 1 1 —1 —1 1 
144 1 1 — 1 —1 1 
121 2 0 0 0 0 0 
127 2 0 0 0 0 0 
115 2 0 0 0 0 0 


. What is the defining relation for this design? What is the resolution, ignoring blocks? 


State the regression model in the form (29.2a). Remember that confounded factor effects 
must not be included in your model. Fit this regression model and obtain the estimated 
factor effect coefficients. Prepare a dot plot of the estimated factor effect coefficients. 
Which effects appear to be active? 


. Obtain a normal probability plot of the estimated factor effect coefficients. Which effects 


appear to be active? How do your results compare with those in part (b)? Do the estimated 
factor effects appear to be normally distributed? 


. Test for the presence of block effects; use a = .05. State the alternatives, decision rule, 


and conclusion. 


. Fit a revised regression model, omitting the block effect term. Obtain a pure error estimate 


of the error variance using the six center-point replicates and (29.17) and conduct a test 
for lack of fit; use œ = .05. State the decision rule and conclusion. Does your test indicate 
the presence of curvature? 


. Using the P-values for the estimated factor effect coefficients obtained in part (e) based on 


the pure error estimate MSPE, test for the significance of the factor effects; use о = .025 
for each test. Which factors are active? 


Refer to Whipped topping Problem 29.29. The model has been finally revised to include 
only the main effects for factors 1, 2, and 5 and the 12 interaction term. 


a. 


Fit the revised model and prepare a plot of the residuals against the fitted values. Do the 
standard regression assumptions appear to be satisfied? 


. Obtain a normal probability plot of the residuals. Also conduct the correlation test for 


normality; use a = .05. Does the assumption of normality appear to be reasonable here? 


. Summarize the results of the experiment with an appropriate set of plots of the main effects 


and interactions. Interpret the results. How is maximum whippability achieved? 


Obtain a 95 percent confidence interval for the expected percent volume increase for the 
whipped topping product when formulated as recommended in part (c). 


Refer to Computer monitors Problem 29.9. Suppose two more replicates were conducted 
for the 2^ full factorial design. Ignoring the center points, the design matrix for the new 
experiment with three replicate responses Уд, У, and Үз follows. Assume that the target 
failure rate is T = 0. 


i Xi X; X; Xa Yn Y Үз 
1 —1 —1 —1 —1 3.68 3.10 5.30 


2 1 -1 -1 —1 3.17 2.75 4.90 


16 1 1 1 1 3.11 1.82 3.95 


*29.32. 


Chapter 29 Exploratory Experiments: Two-Level Factorial and Fractional Factorial Designs 1265 


a. Obtain the sample variances and the logarithms of the sample variances for each of the 
control-factor-level combinations. Does the variance appear to be constant? 

b. Fit the dispersion model (29.41) using the logarithm of the sample variances obtained in 
part (a). Prepare a Pareto plot of the estimated factor effect coefficients. Which dispersion 
effects appear to be active? 

c. Using the subset dispersion model based on the estimates of the active dispersion effects, 
provide estimates of the variance of the response for each control-factor-level combination. 
Are your estimates consistent with the sample variances obtained in part (a)? 

d. Fit the location model (29.39) using weighted least squares. Obtain a normal probability 
plot of the estimated control-factor-effect coefficients. Which effects appear to be active? 
Use a = .05. 

e. Using the subset dispersion and location models based on the active dispersion and location 
effects identified in parts (b) and (d), determine the control factor settings that minimize 
failure rate with minimum variance. 

f. Give 95 percent confidence limits for the predicted variance for the optimal settings iden- 
tified in part (е). How would these limits be used in a confirmation run? 

g. Estimate the mean squared error in (29.38) for the optimal control-factor-level settings 
determined in part (e). 


Leaf springs. An engineer conducted an experiment to identify factors that affect the height 
of an unloaded spring to improve a heat treatment process on truck leaf springs. The target 
value of the height (Y) is 7 = 8 inches. The heat treatment forms the camber (curvature) in 
leaf springs, and was conducted by heating in a high temperature furnace, processing by a 
forming machine, and quenching in an oil bath. The factors of interest were furnace tem- 
perature (X, = —1: 1840°Е; X, = 1: 1880°F), heating time (X? — —1: 23 minutes; X5 = 1: 
25 minutes), transfer time (X4 — —1: short; X4 — 1: long), and hold-down time (X4 — —1: 
short; X, = 1: long). The defining relation used to construct the 2^! design is 0 = 1234. 
The design matrix for the experiment and the observed heights with 6 replicates (Y) follow. 


i X X2 Хз X4 Yn Үә ses Yie 
1 1 1 1 1 7.56 7.62 ee 7.25 
2 1 —1 —1 1 7.56 7.81 ee 7.59 
3 —1 1 —1 1 7.84 7.70 eus 7.20 
4 1 —1 —1 7.69 8.09 ee 7.20 
5 — —1 1 1 7.50 7.56 e 7.50 
6 —1 1 —1 7.59 7.56 ee 7.56 
7 —1 1 1 —1 7.78 7.83 ee 7.12 
8 1 1 1 8.15 8.10 ee 7.25 


Adapted in part from J. J. Pignatiello and J. S. Ramburg, "Discussion of ‘Off-Line Quality Control, 
Parameter Design, and the Taguchi Method’ by Kackar, R. N.,” Journal of Quality Technoiogy, 17, 
pp. 198-206. 


a. Obtain the sample variances and the logarithms of the sample variances for each of the 
control-factor-level combinations. Does the variance appear to be constant? 

b. Fit the dispersion model (29.41) using the logarithms of the sample variances obtained in 
part (a). Prepare a Pareto plot of the estimated factor effect coefficients. Which dispersion 
effects appear to be active? 

c. Using the subset dispersion model based on the estimates of the active dispersion effects, 
provide estimates of the variance of the response for each control-factor-level combination. 
Are your estimates consistent with the sample variances obtained in part (a)? 


1266 Part Six Specialized Study Designs 


d. Fit the location model (29.39) using weighted least squares. Obtain a normal probability 
plot of the estimated control-factor-effect coefficients. Which effects appear to be active? 
Use a = .05. 


e. Using the subset dispersion and location models based on the active dispersion and location 
effects identified in parts (b) and (d), determine the control factor settings that lead to à 
predicted mean height near T = 8 with minimal variance. 

f. Give simultaneous 95 percent confidence limits for the predicted variance for the optimal 
settings identified in part (e). How would these limits be used in a confirmation run? 


g. Estimate the mean squared error in (29.38) for the optimal control-factor-level Settings 
determined in part (e). 


Exercises 


29.33. 


29.34. 


Show that (29.14) holds for balanced two-level experiments; use (2.51) and the additivity of 
the extra sums of squares in this situation. 


Suppose that the true (full) regression model in matrix form is: 
Y = Х.В, - Xof; +E 
However, the analyst assumes that the (reduced) model: 
Y-Xf,-e 


is correct and uses it for purposes of estimation. For example, the X matrix for the reduced 
model (X,) might include only an intercept column and columns for first-order terms, while 
the true model involves first-order terms (X,) and some two-factor interaction terms (X3). 


a. Show that: 


ЕВ.) = В, + AB, 
where A = (X1 X,)^! X' X, is called the alias matrix. 

b. Let X, be the X matrix (based on the intercept and first-order terms only) for the 257! 
design constructed from the defining relation 0 = /23. Let X» consist of the columns X о, 
X5, and X23, corresponding to the omitted two-factor interaction effects £12, Вз, and Вз. 
Use the result in part (a) and b = (Х.Х) ! Xj Y = X,Y/8to show that E(bi) = £i + fas, 


E (b3) = В + Ёз, and E{b3} = Вз + Во. Thus, for this design we have: / = 23, 2 = 13, 
and 3 = 12. 


Chapter ә (J 


Response Surface 


Methodology 


Chapter 29 was devoted to a discussion of the design of two-level factorial experiments. With 
these designs, main effects and two-factor interactions can often be studied with relatively 
few experimental trials. One limitation of two-level designs for factorial studies where 
the factors are quantitative is that they cannot identify curvatures in the response surface. 
Modeling curvature effects can be very important when the objective of the experiment is 
to identify the combination of levels of the quantitative factors that leads to an optimum 
response. Response surface experiments can be used for this purpose. In this chapter, we 
discuss the design and analysis of response surface experiments for studies where the 
factors are quantitative. Response surface designs are generally used in the latter stages of 
an investigation, when five or fewer factors are under investigation. 


30.1 Response Surface Experiments 


When a factorial study involves quantitative factors and the shape of the response surface 
is of interest, the response surface is usually approximated by a second-order regression 
model. The rationale is that the main effects and second-order effects will generally cap- 
ture the essence of the response function since third-order and higher effects are usually 
unimportant. 

The second-order response function for three quantitative factors was given in (8.10). 
We shall generalize it now for k quantitative factors. We continue to use the special coding 
employed in Chapter 29 for the level X; of the jth quantitative factor: 


E{Y} = Po + BiX1+---+BeXe+ ВХ + ВХ + Вох, Xo +--+ + Bere Xk-1Xk 


(30.1) 
where the level X ; of the jth factor is coded as follows: 
ighL L 
Actual Level High Level t Low Level 
X,— 30. 
d High Level — Low Level re 


2 
1267 


1268 Part Six Specialized Srudy Designs 


This coding scheme results in a coded value of —1 for 
Я the low 1 m 
value of 1 for the high level, a coded value of 0 for the midlevel pe s Ee ља coded 
3, 3 n. For instance. 


if the temperature levels of factor j in a study range from 75° : 
5^t : 
values X ; will be used: о 85, the following coded 


Temperature Coded Value 
Level X; 
75 -1 
78 —.4 
80 0 
85 1 


Occasionally, the experimental design will be suppl Я | 
factor levels outside the original range. This will adh ч пш. consisting of 
For instance, if a supplemental treatment in our example involves facto ER —1 or above 1. 
level 70°, the coded value will be X; = (70 — 80)/5 = —2. r j at temperature 
As before, the coefficients £i, . . - , B, in regressio : ; 
coefficients, the coefficients Bii, . . ., Pkk iuda ds HUGE Г Es main effect 
coefficients Bra, Bra» +++» Вее are the interaction effect coefficients Noc and the 
(30.1) involves p = 1 +k+k+k(k— 1)/2 = (-1)Kk-2)/2 ку я 
When designing a response surface study, a minimal requirement is that азылы. 
be capable of providing estimates of the p = (k + 1)(k + 2)/2 parameters i e design must 
Any design of resolution V or higher for a two-level factorial study will in ош (30.1). 
of linear main effects and all two-factor interaction effects that are anus secus 
higher-order effects. However, at least three levels of each factor must b i" ed only with 
estimates of the k quadratic main effects. ust be present to obtain 
One type of design that provides estimates of all А : 
is the full factorial design with each factor at three s en не 
(actor at three levels are referred to as 3* designs, where k denotes the iib of fi ae 
in the study. A number of practical limitations are associated with 3* design к E 
expense. The number of treatments required by a 3* design grows rapidl Ba i Ki s 
factors. For four factors, for instance, a three-level full factorial SURE ы or! E 
treatments. A second disadvantage is that each factor appears at eni thrée devel har 
it will not be possible to test for the presence of cubic or higher-order rm ids ii 
In Sections 30.2 and 30.3, we shall discuss a variety of response surface designs that h 
been developed for estimation of response surfaces based on second-order о (30.1) Ж 
overcome the limitations of 3* designs. Central composite designs, discussed in the next 
section, are general purpose designs that are widely used in practice Optimal response 
surface designs, discussed in Section 30.3, are designs that meet an m tit Spor 
specified by the experimenter. ptimality critenon 


30.2 Central Composite Response Surface Designs 


Structure of Central Composite Designs 


Central composite designs are two-level full or fractional factorial designs that have been 
augmented with a small number of carefully chosen treatments to permit estimation of the 


Chapter 30 Response Surface Methodology 1269 


FIGURE 30.1 Two Central Composite Designs for Two Factors. 


(a) 2? Factorial (b) Central Composite (c) Central ee 
2 


Design Design a = 1 Design a = 


0 1 —1 0 1 —1.41—1 0 11.41 
X X X 


second-order response surface model (30.1). Consider first the 2? factorial design pictured 
in terms of its coded factor levels in Figure 30.1a. If we add a single center point and four star 
points (also called axial points), as shown in Figure 30.1b, the resulting design is a central 
composite design. A star point is one in which all factors but one are set at their mid-levels. 
In terms of the coded values, the coordinates of the four star points in Figure 30.1b are 
(—1, 0), (1, 0), (0, — 1), and (0, 1). As shown in Figure 30.1b, the four star points are located 
at the centers of each of the four edges of the experimental region. Notice that the central 
composite design in Figure 30.1b is in fact a 3? factorial design, where both factors are at 
three levels and all factor level combinations are included. 

The distance from a star point to the center point in coded units is typically denoted by 
a. In Figure 30.1b, the star points are one coded unit from the center; hence for this design 
о = 1. It is sometimes possible to place the star points beyond the experimental region 
defined by the original upper and lower limits of the factors. Figure 30.1c presents a central 
composite design where the star points are located at a distance a = „2 = 1.414 from the 
center. As may be seen from Figure 30.1c, each factor is run at five distinct levels when o 
is larger than 1.0, whereas use of a = 1.0 yields just three distinct levels for each factor, as 
shown in Figure 30.1b. One advantage of setting o greater than 1.0, therefore, is that tests 
for cubic and quadratic curvature effects can then be conducted. 

To summarize, central composite designs consist of three components: 


1. 2-7 corner points. At the base of any central composite design is a two-level full 
factorial design or a fractional factorial design of resolution V or higher. This component 
provides for the estimation of linear main effects and all two-factor interaction effects. 
Corner points have coded coordinates of the form (+1, +1, ... , +1). 

2. 2k star points. These factor level combinations permit the estimation of all quadratic 
main effects. In addition, when o > 1.0, significance tests for higher-order curvature effects 
can be conducted. Star points have coordinates (+g, 0, ..., 0), (0, о, 0, ..., 0), etc. 

3. no center points. If no > 1, a pure error estimate of c? is available and a lack of fit 
test is possible. The coded coordinates of the center point replicates are (0, 0, ..., 0). 


1270 PartSix Specialized Study Designs 


TABLE 30.1 
Three-Factor 
Central 
Composite 
Designs with 
по = 4 
Replications at 
Center Point. 


Factor Level Settings 


Experimental 
Trial X1 X2 Хз 
1 —1 —1 —1 
2 1 —1 —1 
3 -1 1 —1 
4 1 1 —1 
5 —1 -1 1 
6 1 -1 1 
7 —1 1 1 
8 1 1 1 
9 —@ 0 0 
10 a 0 0 
Е 11 0 —@ 0 
12 0 a 0 
13 07 0 —a 
14 0 0 а 
15 0 0 0 
16 0 0 0 
17 0 0 0 
18 0 0 0 


Table 30.1 presents the coded factor level settings for central composite designs for three 
factors, with no — 4 replications at the center point. 


Commonly Used Central Composite Designs 


As we have seen, the term "central composite design" refers to a family of experimental 
designs. Within that family, numerous designs exist, depending on the choice of the base 
corner points, œ, and the extent of replications. Not only may there be no replications at the 
center point but there may also be replications at the corner and star points. We shall let ne 
and n, denote, respectively, the number of replications at each corner point and star point. 
The number of experimental trials at the corner points then is: 

VEU (30.3a) 


where k is the number of factors and f is the level of fractionation in the two-level factorial 
design selected. Similarly, the number of replications at the star points 15: 


2kns (30.3b) 
Thus, the total number of experimental trials planned, denoted by пт as usual, is: 
nr = Tn, + 2kns + по Е (30.30) 


The characteristics of any particular central composite design therefore depend on the 
choices of k, f, @, по, As, and ne. А 
A list of widely used central composite designs is given in Table 30.2 for studies involving 
two to eight factors. (The meaning of the term “rotatability” in Table 30.2 will be explained 
shortly.) The base fractional factorial designs for five to eight factors are the smallest such 


TABLE 30.2 Some Useful Central Composite Designs. 


Chapter 30 Response Surface Methodology 1271 


малага тадан 


2 3 4 5 6 7 8 

2? 28 2 5-1 6-1 27-1 28-2 
4 6 8 10 12. 14 16 
Y 1 1 T 1 1 1 


1.4142 1.6818: :2:0000 20000: 2.3784  2:8284 3:3636 


12. 18 28 30 48: 82 84 


designs that will provide resolution R= V. Table 30.2 also shows the total number of 
experimental trials required when a single replication at the corner and star points of the 
design (i.e., n, = n, = 1) and no = 4 replications at the center point are sufficient. When 
the error variance o? is large relative to the factor effects, larger numbers of replications at 
each treatment will be needed. 


Rotatable Central Composite Designs 


When choosing a particular central composite design, a criterion that is often considered is 
that of rotatability. The rotatability criterion is concerned with the precision of the estimator 
Y, since a main purpose of response surface designs is to estimate the response surface, 
i.e., to estimate the mean response Е{Ү,} in (30.1) at different locations X}, the vector 
of the given levels of the k factors. Rotatable designs have the property that the variance 
of the fitted value at X}, o?(f,), is the same for any point X, that is a given distance from 
the center point, regardless of the direction. The property of equal precision at any given 
distance from the center point is desirable because it is not usually known in advance which 
direction from the center point will be of later interest. A rotatable design provides assurance 
that the precision of the fitted values is not affected by the direction, only by the distance 
from the center point. 

We can examine whether a central composite design is rotatable by considering the 
variance of Ў, as a function of X,. The variance was given in (6.57): 


c?(Y,) = о?Х,(Х'Х) X,, = 0 V, (30.4) 
where: 
V, = Xi, (X'X) Xa (30.4a) 


V, is sometimes called the variance function. Note that V, is a function solely of the coded 
values of the factor levels for the treatments in the design and of the point X; where the 
mean response is to be estimated. Also note that the variance of Y, is a constant multiple of 
Vp, the constant being the error variance о?. Hence, the variance function provides complete 
information of how the variance o?(Y, ) behaves for different points Жу. Figure 30.2 presents 
contour plots of the variance functions for the two central composite designs in Figure 30.1. 
For both of these designs, ne = n, = по = 1, and both use a 2? factorial design as the 


1272 PartSix Specialized Study Designs 


FIGURE 30.2  Contours of Variance Functions for Two-Factor Central Composite Designs. 
(a) o = 1.0 (b) а = 1.414 


0.50 


—0.50 


—1.00 —0.50 0.00 0.50 1.00 
Хт 


base design. They differ only with respect to a. Notice in Figure 30.20 that the contours of 
the variance function for the central composite design with o — 4/2 are circular, indicating 
equal precision at a given distance from the center point. Hence, this is a rotatable design. 
On the other hand, the contours of the variance function in Figure 30.2a are not circular, 
indicating that the design with о = 1 is not rotatable. 

It can be shown that a central composite design is rotatable if: 


—/ 1/4 
jE [=e] (30.5) 


hs 


For the example in Figure 30.2b, we have n, = n, = 1, k = 2, and f = 0. Hence, the 


choice of: 
22-0 1 1/4 
TES | x | — 2 


leads to a rotatable design. Values of o that lead to rotatable designs when n, = n, = 1 are 
provided in Table 30.2. 

While rotatability is a desirable property of a central composite design, it should not 
be the sole basis for making the choice of o. For example, in many instances, it may be 
physically difficult or impossible to extend the star points beyond the experimental region 
defined by the upper and lower limits of each factor. In such cdses, œ must not exceed 
1.0. Also, a design with a = 1 is sometimes easy to implement because only three levels 
are involved for each factor. In these cases, the resulting lack of rotatability may not be 
considered a serious disadvantage. 


^ 


Example The levels of four ingredients of a prototype solid chocolate bar developed by food scientists 
— —— — —- at Fisher Company were to be fine-tuned prior to national distribution. The factors and 


Chapter 30 Response Surface Methodology 1273 


associated ranges were as follows: 


Factor Low Level High Level 
Cocoa butter 8.0 10.0 
Added milk solids 2.0 3.0 
Havoring 2.5 3.5 
Sugar 12.5 18.5 


The response of interest was the overall consumer acceptability as measured on a 10- 
point scale. The objective of the experiment was to determine the levels of cocoa butter, 
added milk solids, flavoring, and sugar that lead to highest acceptability. To carry out the 
experiment, chocolate bars were to be made with different factor level combinations for the 
ingredients, and each type of chocolate bar was then to be subjected to a small consumer 
test. The firm's marketing research department determined that each consumer test would 
cost about $2,500. Because the total cost of the study was not to exceed $75,000, 30 or 
fewer consumer tests could be performed. From Table 30.2, we see that the total number of 
trials for a central composite four-factor design with no — 4 replications at the center point 
is 28, and that this design is rotatable when o — 2. The selected design in coded units is 
shown in Table 30.3. 


Comments 


1. А central composite design witha = 1 is often called a face-centered design. For k = 3 factors, 
for instance, this design locates the star points at the center of each of the six faces of the base design 
cube. 

2. When it is not possible to extend the star points beyond the factorial region defined by the 
original ranges of the factors, a rotatable inscribed central composite design can often be used. In 
such an inscribed design, the coded factor level settings are rescaled by the factor 1/a so that all 
coded factor levels fall between —1 and 1. To illustrate the rescaling, we know from Table 30.2 that a 
two-factor central composite rotatable design requires the choice of œ = 1.414 when n, = ns = 1. То 
obtain an inscribed two-factor, rotatable central composite design, each coded factor level is multiplied 
by 1/1.414. The original rotatable design, with no = n, = n, = 1, and the Corresponding inscribed 
design are shown in Table 30.4. The inscribed design has the appropriate value of o (1.0), and no 
factor levels are outside the original ranges for each factor. Note that the actual factor levels now need 
to be rescaled as well. Consequently, the corner points of the design will no longer be at the limits of 
the ranges for the factor levels. When this is undesirable, an inscribed design will not be appropriate. 

m 


Other Criteria for Choosing a Central Composite Design 
Other criteria for the choice of a central composite response surface design, besides ro- 
tatability, have been proposed. Two of these are orthogonality and uniform precision. An 
unblocked central composite design is orthogonal if the estimated factor effect coefficients 
are all uncorrelated. A proper choice of no, the number of center point replicates, will lead 
to an orthogonal central composite design. For example, some orthogonal central composite 
designs for two to five factors are as follows for n, = п, = 1 replicate at each star and 


1274 PartSix Specialized Study Designs 


TABLE 30.3 Я 
Three-Factor Experimental c шы evel Sean МИҢ 
Central Trial X 1 X 2 X 3 X 4 
i Composite 1 1 = 1 —1 
a = 2.0 3 —1 1 z —1 
Company 5 -1 —1 1 —1 
Example. 6 1 24 1 “4 
H 7 —1 1 1 —1 
8 1 1 1 —1 
y 9 -1 -1 zi 1 
| 10 1 —1 —1 1 
н, 11; -1 1 -1 1 
{ 12’ 1 1 —1 1 
T 13 -1 -1 1 1 
à 14 E! —1 1 1 
ч 15 e 1 1 1 
16 1 1 1 1 
i | 17 —2.0 0 0 0 
18 2.0 0 0 0 
i 19 0 -2.0 0 0 
i 20 0 2.0 0 0 
! 21 0 0 -2.0 0 
| 22 0 0 2.0 0 
| 23 0 0. 0 -2.0 
A 24 0 0 0 2.0 
1 25 0 0 0 0 
26 0 0 0 0 
27 0 0 0 0 
28 0 0 0 0 
a 
TABLE 30.4 Central Composite Inscribed Central 
ае Design Composite Design 
Inscribed Experimental a ы — ———————— 
Central Trial Xi X2 X X2 
Composite 1 -1 -1 —.707 —.707 
каре рар 2 1 -i 707 —.707 
9 = П. = Н; = 3 -1 1 —.707 707 
1, œ = 1.414. 4 1 1 .707 .707 
5 —1.414 0 —1 0 
6 1.414 0 1 0 
| 7 0 —1.414 0 —1 
| 8 0 1.414 0 1 
i 9 0 0 0 0 


Chapter 30 Response Surface Methodology 1275 


corner point: 


Number of Factors 


Design Characteristic 2 3 4 5 
Base factorial design 22 25 2 25 
® 8 9 12 17 


Notice that for each of these designs, the required number of replications at the center 
point is quite large. While orthogonality is desirable because it simplifies the analysis of 
the results, at times it will be difficult to justify large expenditures for replications at the 
center point. Lack of orthogonality is not a serious disadvantage in practice today because 
the analysis of the experimental results is easily handled by using а computer regression 
package. 

A uniform precision central composite design is a rotatable design for which the precision 
of the estimated mean response is the same at the center point asit is one unit from the center 
point (in any direction). Uniform precision designs are obtained by appropriate choices of 
о and по. The following are the required values of o and no for studies with two to five 
factors: 


Number of Factors 
Design Characteristic 2 3 4 5 


Base factorial design 22 2? 24 25 
a for rotatability 1.414 1.682 2.000 2.378 
No 5 6 7 10 


Like for the orthogonal designs above, the number of center point replications required for 
uniform precision may be too large. Uniform precision is therefore often used only as a 
secondary criterion for determining the number of replications at the center point. 


Blocking Central Composite Designs 

One useful characteristic of central composite response surface designs is that they can 
be blocked easily. The corner points of the central composite design, which constitute a 
2*-f factorial design, can be blocked by the methods described in Section 29.5. As noted 
there, one or more center point replications can be allocated to each of these blocks. Any 
remaining center point replications and all star points will constitute a final, separate block. 
Thus, if the base 2-f factorial design is run in b blocks, the central composite design is 
run in b + 1 blocks. The resulting blocking arrangement is then as follows: 


Blocks 1 tob: 2 base factorial design in b blocks, with nj 


center point replications in each block 
(30.6) 
Blockb+1: 2k star points, with no — bn} center point 


replications added 


1276 Part Six Specialized Study Designs 


Augmenting Two-Level Studies. The blocking arrangement just described can also be 
used to facilitate the implementation of a central composite design in two stages, which js 
often desirable. In the first stage, a two-level study with some center point replications js 
conducted in one or more blocks. If the test for lack of fit suggests the presence of curvature 
or if a better approximation of the response surface is desired, the initial two-level study 
is augmented with star points and additional center point replications. These additional 
experimental trials constitute an additional block. 


Comment 


Blocking arrangement (30.6) ensures that estimated block effects will be uncorrelated with estimated 
linear main effects and two-factor interactions, but the estimated block effects may be correlated 
with the estimated quadratic main effects. A central composite design that is orthogonally blocked 
will also provide that the estimated block effects are uncorrelated with the estimated quadratic main 
effects. It is not always possible to achieve both rotatability and orthogonal blocking. Often, however, 
orthogonal blocking and approximate rotatability can be achieved by suitable choices of the locations 
of the star points and by the allocation of the center point replications to the blocks. Reference 30.1 
provides further information on orthogonally blocked central composite designs. п 


Additional General-Purpose Response Surface Designs 


While central composite designs are the most widely used general-purpose response surface 
designs, other general-purpose designs are available. One important class of alternative 
designs is the Box-Behnken family of designs. Box-Behnken designs differ from central 
composite designs in two ways. First, only three levels for each factor are employed. Second, 
Box-Behnken designs have no corner points. Box-Behnken designs are sometimes preferred 
to central composite designs when physical or economic constraints prevent the use of the 
corner points—where all factor levels are at an extreme. A listing of Box-Behnken designs 
and their blocking arrangements is provided in Reference 30.2. 


30.3 Optimal Response Surface Designs 


Purpose of Optimal Designs 


Central composite response surface designs have been developed for fairly standard exper- 
imental situations where the response surface of interest can be reasonably approximated 
by the second-order polynomial response function (30.1) and the experimental region is 
defined by the upper and lower limits of the factor levels. Also, since central composite 
designs are general purpose designs, they are not oriented to provide either optimum preci- 
sion of the regression parameters or optimum precision for estimating mean responses for 
particular circumstances. 

Optimal designs are useful when optimization of the precision is gf key importance and/or 
when nonstandard experimental situations are encountered. We consider now three main 
types of nonstandard experimental conditions where central composite designs may not 
be feasible—irregular experimental regions, nonstandard models, and nonstandard sample 
sizes. 


Irregular Experimental Regions. Irregular experimental regions are quite common 
in industrial studies. One simple example, described in Reference 30.3, involved the 


Chapter 30 Response Surface Methodology 1277 


FIGURE 30.3 Operating Region and Three Alternative Designs with nz = 11—Rutgers Experimental Station 


Example. 
(a) Design 1: (b) Design 2: (c) Design 3: 
Scaled Central D-Optimal Modified 
Composite Design Design D-Optimal Design 
1.5 1.5 
1.0 1.0 
X2 0.0 X2 0.0 
-0.5 -0.5 
—1.0 -1.0 
—1.5 —1.5 E 
—1.5—1.0—0.5 0.0 05 10 1.5 —1.5—1.0—0.5 0.0 0.5 1.0 1.5 —1.5—1.0-0.5 0.0 0.5 1.0 1.5 
X X X 


application of two fertilizers at the Rutgers Experimental Station to determine the lev- 
els of the fertilizers that would optimize the yield of a particular crop. It was known in 
advance of the experiment that a toxic level of the chemicals would result if both of the 
fertilizers were applied simultaneously at their high levels. The investigators determined 
that the sum of the two fertilizers (in coded units) should not exceed 1.0: 


X, +X. < 1.0 (30.7) 


This constraint leads to the irregular experimental region shown in Figure 30.3a. Also shown 
in Figure 30.3a is a face-centered central composite design with three replications at the 
center point. Notice that the ranges of the two factors must be considerably reduced to 
accommodate the standard central composite design here. Figure 30.3 also contains two 
other designs for this experimental study that we shall discuss shortly. 


Nonstandard Models. Nonstandard models can arise for a variety of reasons. For exam- 
ple, the investigator may know that the response function for a two-factor study is approx- 
imately linear in X, for constant Хз and approximately quadratic in X2. An appropriate 
regression function then would be: 


Е{Ү} = Bo + В.Х, + BoX2 + ВХ, X2 + В2Х3 


Nonstandard models also arise in response surface experiments when both qualitative and 
quantitative factors are present. In the above example, if the first factor were a qualitative 
factor with two levels, a response function of the following form would be appropriate: 


E(Y) = fo + Bil + ВХ + Bil X2 + £2 X2 


n —1 if factor 1 at level 1 
1771 1 if factor 1 at level 2 


1278 PartSix Specialized Study Designs 


Nonstandard Sample Sizes. In the chocolate bar optimization study of Section 30.2, 
budgetary considerations required that the number of runs in the experiment not exceed 30, 
From Table 30.2, we found that a four-factor central composite design with four replications 
at the center point was feasible since it would require ny = 28 experimental trials. Suppose 
now that the budget for the experiment were only $50,000. At $2,500 per market test, the 
maximum number of trials now would be 20, and the selected central composite design 
would no longer be feasible, even with no replications at the center point. 

It is possible, nonetheless, to construct experimental designs that will provide estimates 
of all of the parameters in the full second-order response function (30.1) in fewer than 
20 runs since there are only 15 parameters in this model when k —4. Optimal design 
techniques can be used here to construct a potentially useful second-order design for any 
feasible experimental size between 15 and 20 trials. 


Optimal Design Approach 


Example 


FIGURE 30.4 
Candidate Set 
of 
Treatments— 
Rutgers 
Experimental 
Station 
Example. 


In order to construct an optimal experimental design, the investigator must first specify the 
following: 


1. The number of experimental trials, n7. 

2. The response function of interest. 

3. A candidate list, C, of feasible treatments. 

4. A statistical design criterion for the selection of the treatments from the candidate list 
C and for the allocation of the пт trials to the selected treatments. 


Once these specifications have been made, numerical computer search procedures are usu- 
ally employed to find the experimental design that meets optimally the design criterion. 


To illustrate the optimal design approach, consider again the Rutgers Experimental Station 
example. The feasible experimental region is shown in Figure 30.4a. Suppose that no 
more than пт = 11 experimental trials can be made, and that the response function is the 


(a) 5 x 5 Grid of (b) 22-Point 
Feasible and Infeasible Candidate 
Candidate Points Set 


-4.5 —1.0 —0.5 00 05 10 1.5 125 —1.0-05 00 05 10 1.5 
A X, 


Chapter 30 Response Surface Methodology 1279 


second-order one in (30.1): 
E(Y) = Bo + В.Х: + ВХ + ВХ? + £X2 + BoXi1Xa (30.8) 


To obtain an optimal design, it is still necessary to specify a candidate list of treatments 
and a criterion for design selection. Often, the candidate list of treatments is obtained from 
a grid of regularly spaced points in the feasible experimental region. Figure 30.4a shows a 
5 x 5 grid of treatment points over the unconstrained region. Of the 25 grid points, three 
fall in the infeasible region because the sum X, + Хә for these points exceeds the constraint 
in (30.7). These three infeasible grid points therefore need to be deleted, resulting in the 
22-point candidate set shown in Figure 30.4b. 

Finally, a statistical criterion for the selection of the experimental design with ny trials 
must be provided. We shall now discuss two such criteria that are widely employed. 


Design Criteria for Optimal Design Selection 


Example 


D Criterion. When precise estimation of model parameters is of primary interest, the D 
(determinant) criterion provides a useful measure of the precision of an experiment. This 
criterion is based on the joint confidence region for the parameters in the normal error 
regression model. This joint confidence region is given by the set of coefficient vectors В 
that satisfy the inequality: 


(b — B)'X'X(b — p) 


MSE x Fü —o;p,n-— р) (30.9) 


For simple linear regression, where the unknown parameters are Во and В;, the boundary 
of this region is an ellipse. For models with three or more parameters, the boundary of the 
confidence region is an ellipsoid. One measure of the precision of the parameter estimates 
is the area or volume (for three or more parameters) of the confidence region. A small confi- 
dence region area or volume implies high precision. When the objective of the experiment is 
to estimate the vector В precisely, the confidence ellipse or ellipsoid for В should therefore 
be small. It can be shown that minimizing the volume of the confidence region (30.9) is 
equivalent to minimizing: 


D = |(Х'Х) Ц (30.10) 


where |(Х'Х)- | denotes the determinant of (X'X)^!. Hence, the smaller is the determinant 
I(X'X)7!|, the smaller is the volume of the confidence region. A design that minimizes 
|(X’X) | is said to be D-optimal. 


We illustrate the use of the D criterion for the Rutgers Experimental Station example by 
considering the three experimental designs in Figure 30.3. The design in Figure 30.3a, as 
we noted earlier, is a scaled central composite design with three replications at the center 
point, requiring пт = 11 trials. The designs in Figures 30.3b and 30.3c also require пт = 11 
trials but involve a different set of treatments than the scaled central composite design. The 
designs in Figures 30.3b and 30.3c utilize the same set of treatments but differ as to which 
treatments receive more than one replication. Calculation of the determinant |(X'X) ! | will 
be done ordinarily by use of a computer or a programmable calculator. We find that the 


1280 PartSix Specialized Study Designs 


Example 


values of the determinant criterion for the three designs under consideration are: 


Designl: D = |(X’X)~'| = .009117 
Design2: D —|(X'X)-'| = .000161 
Design3: D —|(X'X)-'| = .000347 


Since design 2 yields the smallest value of D among the three proposed designs, design 2 
is preferred to designs 1 and 3 on the basis of the determinant criterion. 


Relative Efficiency of Two Designs. A measure of the relative efficiency of design 1 
relative to design 2 according to the D criterion is the following, where X, and X; are the 


X matrices for the two designs: 
оох) ү” 
Ер = Pun (30.11) 
I 


For the Rutgers Experimental Station example, the relative efficiency of design 1 compared 


to design 2 is: 
g — (200161 —- = 
Р = (009117) —. ^ 


The relative efficiency measure states that design 1 is only 51 percent as efficient as design 
2. This means that design 1 would need to be replicated 1/.51 — 1.96 times in order to 
achieve as small a confidence region for the regression parameters as design 2. 


V Criterion. The objective of response surface experiments often is the estimation of the 
mean response E(Y; at different combinations of factor level settings, denoted by X,. The 
estimation of these mean responses often is used to identify the factor settings X; for which 
the mean response E(Y,) is either maximized or minimized. The V criterion considers the 
variances o? (f at factor level combinations X; of interest and employs the average of these 
variances as the criterion. Let P denote the set of np factor level combinations (X, vectors) 
at which the experimenter wishes to estimate the mean response. Often, the estimation set 
P is the same as the candidate set C. At other times, the two sets do not coincide, as when 
P contains points outside of the experimental region because the investigator anticipates 
the need for estimating mean responses in a region where experimentation is costly. Using 
(30.4) to express the variance c? (1,) in terms of the variance function Vp, we can state the 
average of the variances of Ў, for the estimation set P as follows: 


oh} XV. 25 


zy (30.12) 
np np 
where: 
y >V { (30.12а) 
np 


A design that minimizes V in (30.122) is called a V-optimal design. 


In the Rutgers Experimental Station example, the estimation set P is to consist of the 
22 candidate treatments in Figure 30.4b. The variance function V, was evaluated first for 


FIGURE 30.5 
Design 2 
Variance 
Function V; 
Evaluated at 
Points in 
Estimation 
Set —Rutgers 
Experimental 
Station 
Example. 


Chapter 30 Response Surface Methodology 1281 


design 2 for the 22 treatments in the estimation set. The results are shown in Figure 30.5. 
Note that the treatments at (1, О) and (0, 1), the two vertices of the operating region with 
no replications, have large V, values. Consequently, with design 2 the mean responses for 
these two treatments will not be estimated as precisely as for the other treatments. The mean 
of the 22 V, values for design 2 is V — .500. In the same fashion, we find V for the other 
two designs. The comparative results for the three designs are: 


Design ПД 


1 1.192 
2 .500 
3 .486 


Hence, according to the V criterion design 3 is slightly preferred over design 2, and both 
of these designs are substantially better than design 1. 


Relative Efficiency of Two Designs. A measure of the relative efficiency of design 1 
relative to design 2 according to the V criterion 1s the following, where V; and V2 denote 
the averages of the V, values for the two designs: 
V2 
Vi 
For the Rutgers Experimental Station example, the relative efficiency of design 1 relative 
to design 3 is: 


E, — (30.13) 


Design 1 is only 41 percent as efficient as design 3 according to the V criterion, implying 
that it would require 1/.408 — 2.45 replications of design 1 to achieve the same average 
precision as with design 3. 


1282 PartSix Specialized Study Designs 


Comment 


Other criteria that have been proposed for identifying a design as optimal involve minimizing the 
average variance of the estimated regression coefficients (A-optimality) and minimizing the maximum 
variance of Ў„ over the estimation set (G-optimality when the estimation set P is the same as the 
candidate set C). n 


Construction of Optimal Response Surface Designs 


Example 


On occasion, the optimal design for a given criterion is known or can be found analytically. 
Usually, however, a computer search is required to find the optimal design. Many statistical 
software packages provide capabilities for finding optimal designs. To reduce the amount 
of computing required, these packages do not evaluate all possible designs. Instead, fast, 
special-purpose computer search procedures, called exchange algorithms, are used to find 
designs that are either optimal or nearly optimal. These algorithms begin the search with a 
starting design, sometimes randomly chosen. They then alternately add new points to the 
design and subtract points from the design in ways that lead to improvements in the design 
criterion. Since these algorithms do not evaluate every possible design, they cannot guarantee 
that an optimal design has been found. To increase the likelihood that a best or near- 
best design is found, some software packages provide capabilities for repeated attempts, 
beginning the search from different, randomly selected starting designs. A discussion of 
these search procedures is given in Reference 30.4. 


IC Technologies is a manufacturer of dashboard displays used in the automotive industry. 
An important component of the manufacturing process involves the bonding of a computer 
chip to a glass surface with adhesive. Management wished to determine which of two 
types of adhesive, provided by two different suppliers, was superior. Identification of the 
optimum processing temperature was also of interest. The response of interest was bonding 
strength—the amount of force required to break the chip free of the surface. The factors 
and associated levels were as follows: 


Factors Levels 
Adhesive Type 1 Type 2 
Process temperature 210 240 270 


+ 


Notice that adhesive is a qualitative factor that can assume only two levels. Process tem- 
perature is a quantitative factor that has a range from 210 to 270. The process engineers 
wished to limit the number of temperature levels to the limits of the range (210, 270) and to 
the middle (240). The candidate set of treatments is therefore given by the six factor-level 
combinations shown in Figure 30.6a. 

Since adhesive type is a qualitative factor with two levels and a quadratic (second-order) 
temperature effect was expected, the response function chosen was the following: 


E(Y) = Bo + Bil, + BoX2 + Ba X2 + Biol Xa 
where: 


I= —] if adhesive is type 1 
|] 1 if adhesive is type 2 
X2 = coded temperature 


FIGURE 30.6 
SAS Optimal 
Design 
Construction— 
IC 
Technologies 
Example. 


Chapter 30 Response Surface Methodology 1283 


(a) Candidate Set (b) Optimal Design 


Adhesive 
Type 1 


Adhesive 
Type 2 


210 240 270 210 240 270 
Process Temperature Process Temperature 


Management determined that at most eight experimental trials could be handled and speci- 
fied that the V criterion be employed. The estimation set of interest consisted of 21 equally 
Spaced points spanning the process temperature range for each of the two types of adhesive. 
Note that the 42-point estimation set P here is not the same as the candidate set C. The 
JMP Custom Design option was used to obtain the V-optimal design for nz = 8 shown in 
Figure 30.6b. Notice that the medium level of temperature (240) is replicated twice for each 
adhesive type. Thus, two degrees of freedom will be available for a pure error estimate of 
the error variance and a lack of fit test will be possible. 


Some Final Cautions 


Caution in using optimal designs is important because these designs are best for particular 
choices of sample size, design space, response function, and design criterion. For example, 
designs that are optimal according to one criterion may be far from optimal according to an- 
other criterion. Also, optimal designs are highly sensitive to the choice of response function. 
A design that is optimal for a second-order response function is generally not optimal if a 
first-order response function is the true function. Consequently, the experimenter needs to 
consider whether the optimal design will be far from being optimal if the assumed response 
function is incorrect, and whether the optimal design will provide sufficient information 
about the true response function if the assumed one is incorrect. 

Another reason for caution in choosing optimal designs is that they are constructed on the 
basis of a single design criterion. Frequently, an experimenter has a number of potentially 
conflicting objectives. It is therefore important that any candidate design be evaluated for its 
ability to satisfy each of these goals. Small modifications to computer-generated designs— 
such as the addition of replications at the center point—can be useful for increasing the 
overall utility of a design even if it is then no longer an optimal design according to a given 
criterion. It is often useful to construct optimal designs for a range of sample sizes and a 
variety of response functions and criteria. A final design can then be chosen on the basis of 
its ability to reasonably meet the different objectives over the range of response functions 
and criteria. 

A thorough discussion of optimal designs is presented in Reference 30.5. 


1284 PartSix Specialized Study Designs 


30.4 Analysis of Response Surface Experiments 
The analysis of second-order response surface designs frequently involves three phases: 


1. Estimation of response function 
2. Model interpretation and visualization 
3. Identification of optimum operating conditions 


In phase 1, standard regression tools are used to estimate the response function and obtain 
a good regression fit. The fitted surface is then explored graphically in phase 2. Finally, in 
phase 3, factor level combinations that lead to an optimum response are identified. Fitting of 
polynomial regression models was already discussed in Chapter 8. Неге, we shall focus on 
the visualization of the fitted model and the identification of optimum operating conditions, 


Model Interpretation and Visualization 
Three-dimensional plots of the response surface, contour plots, and conditional effects plots 
are the primary visual tools for interpreting and communicating the results of response 
surface experiments. Generally, three kinds of fitted surfaces arise in practice. 


1. A mound-shaped surface, which is characterized by contours that are ellipses or 
circles. Figure 30.7 presents a three-dimensional response surface plot and a contour plot 
of the fitted response function: 


Y —65--3X1 c AX; — 10X? — 15X2 + 15X, X2 


The contour plot in Figure 30.7b shows that the estimated mean response increases from a 
minimum of Ў = 34 in the lower right corner (Ху = 1, X; = —1) to a maximum in the 
center of the region bounded by the Y — 66 contour. 


FIGURE 30.7 Two-Factor Response Surface and Contour Plot—Mound-Shaped Surface. 
(a) Fitted Response Surface (b) Contour Plot 


2 
2 SEN 
EESO 


9 
SSS 


{3 


Fitted Response 


Chapter 30 Response Surface Methodology 1285 


FIGURE 30.8 Two-Factor Response Surface and Contour Plot—Bowl-Shaped Surface. 
(a) Fitted Response Surface (b) Contour Plot 


Fitted Response 


2. A bowl-shaped surface, which also has elliptical or circular contours; however, the 
response function decreases in the direction of the smallest ellipse. Figure 30.8 presents the 
response surface and a contour plot of the fitted response function: 


Y = 6.5 -- 6X1 - 2X; + 9X2 -- AX2 + ХІХ; 


From the contour plot in Figure 30.8b, we see that the surface decreases from a maximum in 
the upper right corner (X1 = 1, Хз = 1) toa minimum in the center of the region bounded 
by the Ў = 6 contour. 

3. A response surface with a saddle or a minimax. Figure 30.9 presents the response 
surface and a contour plot of the fitted response function: 


Y = 65 -- 3X, c AX; — 10X? — 15X2 + 35X, X2 


From the contour plot in Figure 30.9b, notice that the mean response increases from the 
upper left corner to a maximum in the center of the region and then decreases as we approach 
the lower right corner. The opposite occurs when moving from the upper right corner to the 
lower left corner. 


Conditional effects plots, or interaction plots, can also provide useful insights. Fig- 
ure 30.10 presents a conditional effects plot for the saddle-shaped surface in Figure 30.9 at 
Х = —1,0, 1: 


Xı=-—1: Ê =46-— 32X; — 10X? 
Х»= 0: Y¥=65+43X,—10X? 
Xo= 1: ¥=54+438X,—10X? 


1286 Part Six Specialized Study Designs 


FIGURE 30.9 Two-Factor Response Surface and Contour Plot—Saddle-Shaped Surface. 
(a) Fitted Response Surface (b) Contour Plot 


Fitted Response 


FIGURE 30.10 100 


Conditional 
Effects Plots 80 
for 
Saddle-Shaped 2 
Surface in 9 60 
Figure 30.9. E, 
2 40 

20 

0 

—1.0 —0.5 0.0 0.5 1.0 


Notice that at low X; the mean response is decreasing in Ху, whereas at high X the mean 

response is increasing in X;. Thus, the presence of interaction effects is clearly indicated 

by the plot. Absence of interaction effects would be indicated, as usual, by parallel curves. 
Response Surface Optimum Conditions : 
Response surfaces are frequently fitted for the purpose of finding the combination of factor 
levels that leads to an optimum response. Usually, either a maximum response (e.g., maxi- 
mum yield) or a minimum response (e.g., minimum waste) is sought. Mound-shaped re- 
sponse surfaces, such as in Figure 30.7, have a unique maximum, while bowl-shaped 
response surfaces, such as in Figure 30.8, have a unique minimum. Occasionally, more 


Example 


Chapter 30 Response Surface Methodology 1287 


complex response surfaces are encountered that have saddle points, such as in Figure 30.9, 
or a number of local maximum or minimum points. 

For a second-order fitted response surface, the point where a maximum, a minimum, or 
a saddle point occurs, denoted by the vector X,, is: 


1 
X,--IBUb (30.14) 
where: 
by/2 bn +++ baf2 by 
B, = | . : b* = |. (30.14а) 
bu/2 ba/2 >+- bu by 


To determine whether the point X, corresponds toa maximum, a minimum, ora saddle point, 
the nature of the response surface must be known. Ifa contour plotting capability is available 
and there are just two or three factors, the nature of the surface can usually be determined by 
examining the contours in the vicinity of X,. Otherwise, characteristics of the matrix B called 
eigenvalues can be used to determine whether the point at X, is a maximum, a minimum, 
or a saddle point. Many computer packages for response surface analysis provide these 
eigenvalues. If the eigenvalues are all positive, the point is a minimum. If the eigenvalues 
are all negative, the point is a maximum. Finally, if some eigenvalues are positive and some 
negative, the point is a saddle point. 


Consider again the mound-shaped response surface in Figure 30.7: 
Y = 654+ 3X, +4X2 — 10X? — 15X2 + 15X1X» 


We know that this surface has a maximum and wish to locate it. We require the matrix B 
and the vector b*. Using (30.142), we obtain: 


[i0 15/2 . [3 
B=| isp a b iH 


Using (30.14), we find the point where the response surface is at the maximum: 
x = i[-10 15/2] [3 
* 2|15/2 —15 4 
_ _1[—1600 —.0800] [3]  [.40 
~ 2|-.0800 —.1067||4| |25 
The maximum response on the fitted surface, at X; = .40 and X; = .25, is: 
Y = 65 + 3(.40) + 4(.25) — 10(.40)? — 15(.25)? + 15(.40)(.25) = 66.16 


Comments 


1. When the maximum or minimum point for the response surface falls well outside the operating 
region, it may not be feasible to operate at this point and the investigator must then search for the 
factor level combination that optimizes the mean response within the operating region. For problems 
involving just two or three predictors, this point can usually be pinpointed using contour plots and 


1288 PartSix Specialized Study Designs 


Example 


conditional effects plots. For problems involving four or more factors, constrained nonlinear program- 
ming methods can be used to identify the optimum factor level combination. Many statistical software 
packages that provide capabilities for the design of experiments include this feature. Alternatively, a 
grid of points (such as those used to identify candidate points for optimal design construction) can 
be constructed and the estimated mean response for each gridpoint is then obtained. If the grid is 
sufficiently dense, the gridpoint that leads to the maximum (minimum) estimated mean response will 
closely approximate the optimum point. 

When it is feasible to operate outside the experimental region and the optimum point falls wel] 
outside this region, it is often necessary to extend the experiment because of uncertainty about the 
shape of the response surface outside the region of experimentation. 

2. In most experiments, more than a single response variable is of interest. For example, in food 
processing experiments, response variables such as taste, texture, aftertaste, mouthfeel, shelf life, 
and cost are all frequently of interest. As discussed in Section 29.6, another variable of interest in 
many studies is the variance of the response variable. In the IC Technologies example, for instance, 
the manufacturer is concerned not only that the mean bonding strength be adequately high but also 
that the process variability be small so that almost all components will be bonded with sufficient 
strength. To analyze experiments with multiple responses, a response surface must be fitted to each 
response variable. Unfortunately, it is rare that a single factor level combination can be found that 
simultaneously optimizes all fitted response surfaces. In fact, often the conditions that lead to an 
optimum value of one response variable (such as texture) lead to a poor response for another variable 
(such as taste). The investigator must then search for conditions that lead to acceptable responses for 
all response variables. m 


Dorle Exterior Trim manufactures polyurethane bumpers for automobiles and light trucks. 
During the initial production stages of a new model, blemishes appeared on the surface of 
the bumpers. These blemishes, resulting from a high degree of surface porosity, were so 
extensive that none of the bumpers could be shipped. A response surface experiment was 
quickly conducted to investigate the effects of three key process variables on porosity and 
to identify the optimum operating levels for the active process variables. The three factors 
were chemical temperature, mold temperature, and curing time. The operating ranges for 
these factors were: 


Factor Low Level High Level 
Chemical temperature 405 425 
Mold temperature 100 240 
Curing time 20 40 


A three-factor central composite response surface design witha = 1 and no = З replications 
at the center point was chosen. Porosity counts were obtained from visual inspections of 
the surface of the bumpers. б 

The analyst first obtained an initial fit of the three-factor second-order response func- 
tion (30.1). Residual analysis did not reveal any departures from the standard regression 
assumptions. The fit suggested that the third factor, curing time, was unrelated to porosity. 
All P-values for terms involving curing time were greater than or equal to .600. A test of 
Ho: Вз = £33 = Віз = Воз = О by the general linear test statistic (2.70) resulted in the test 
statistic F* = .179 and the P-value .943. The analyst therefore concluded Hp, that curing 
time is unrelated to porosity. 


FIGURE 30.11 
SAS PROC 
RSREG 
Regression 
Output—Dorle 
Exterior Trim 
Example. 


Chapter 30 Response Surface Methodology 1289 


Regression Degrees of Type ! Sum R-Square F-Ratio Prob > F 
Freedom of Squares 
Linear 2 5075.300000 0.6894 87.308 0.0000 
Quadratic 2 1854. 363485 0.2519 31.900 0.0000 
Crossproduct 1 112.500000 0.0153 3.871 0.0749 
Total Regress 5 7042.163485 0.9566 48.457 0.0000 
Degrees of Sum of 
Residual Freedom Squares Mean Square 
Total Error 11 319.718868 29.065352 
Degrees of Parameter Standard T for HO: 
Parameter Freedom Estimate Error Parameter=O Prob > |T| 
INTERCEPT 1 16.301887 2.221627 7.338 0.0000 
Xi 1 —22.300000 1.704856 —13.080 0.0000 
х2 1 3.200000 1.704856 1.877 0.0873 
Х1*Х1 1 12.443396 3.097911 4.017 0.0020 
X2*X1 1 3.750000 1.906087 1.967 0.0749 
X2*X2 1 11.943396 3.097911 3.855 0.0027 
Critical Value 

Factor Coded Uricoded 

X1 0.938443 0.938443 

х2 —0.281292 —0.281292 


Predicted value at stationary point 5.388176 


Eigenvectors 


Eigenvalues х1 х2 
14.084989 0.752384 0.658725 
10.301803 —0.658725 0.752384 


The SAS PROC RSREG output for the fit of the second-order response surface model 
with only chemical temperature and mold temperature as the explanatory variables is shown 
in Figure 30.11. The fitted response surface is: 


f = 16.30 — 22.30X, + 3.20X + 12.44X2 + 11.94X2 + 3.75X, X2 


Notice that the P-values for all estimated coefficients are less than .1, and that R? is .957. 
A lack of fit test was conducted with o — .01. The results (F* — 3.10; P-value — .089) 
supported the appropriateness of the model fitted. 

A response surface plot and a contour plot for the fitted response function are shown 
in Figure 30.12. The X; scale has been reversed in these plots to provide a better view of 
the response surface. Notice that the surface is bowl-shaped. Since a main objective of the 
experiment was to find the levels of the process variables that minimize the porosity on the 
bumper surface, the analyst next determined the optimum levels of X; and X; by means of 
(30.14). Substituting into this formula, the analyst obtained: 


1 
Xx ccc | 


1.875 11.94 3.20 —.28 


l[1244 1875] [-2230] [ .94 
2 2 2 


where f;(@) is a known function of the parameter Ө and the e; are random variables, usually 


1306 Appendices 
assumed to have expectation E{e;} = 0. 

With the method of least squares, for the given sample observations, the sum of squares; 

(A.57) 


о= Уи - FOP 


i=! 


is considered as a function of 0. The least squares estimator of 0 is obtained by minimizing 
О with respect to 0. In many instances, least squares estimators are unbiased and consistent. 


A.6 Inferences about Population Mean—Normal Population 
.. , Y, from a normal population with 
(A.58a) 


We have a random sample of n observations Yi, 
mean u and standard deviation с. The sample mean and sample standard deviation are: 


y 211 
n 
(A.58b) 


Yo - r^ 
s = | 
п—1 
and the estimated standard deviation of the sampling distribution of Y, denoted by s{F}, is: 

2 E 

Y} = — : 

stYj WT (A.58c) 
(A.59) 


We then have: 
is distributed as t with n — 1 degrees of freedom 


ай 
when the random sample is from a normal population. 


hel 


5{Ў} 
Interval Estimation 
The confidence limits for u with confidence coefficient 1 — o are obtained by means of 
(A.59): 
Y X t(1— 0/2; n — DstY] (A.60) 


Obtain а 95 percent confidence interval for и when: 


LÀ 


Example 1 
1(.975; 9) = 2.262 


We require: 
= 4 
s{Y} = —— = 1.265 
{Y} 16 
The 95 percent confidence limits therefore are 20 + 2.262(1.265) and the 95 percent con- 


fidence interval for и, is: 
17.1 < u < 22.9 


TABLE А.1 


Decision Rules 


for Tests 
Concerning 
Mean 4 of 
Normal 
Population. 


Tests 


Example 2 


Appendix A Some Basic Results in Probability and Statistics 1307 


One-sided and two-sided tests concerning the population mean yz are constructed by means 
of (A.59), based on the test statistic: 


Y — uo 
t* = — A.61 
st (A.61) 
Table A.1 contains the decision rules for three possible cases, with the risk of making a 
Type I error controlled at a. 


Choose between the alternatives: 
Ho: u < 20 
На: u > 20 
when o is to be controlled at .05 and: 
n=15 Y —24 s=6 
We require: 


- 6 


t(.95; 14) = 1.761 
The decision rule is: 


If t* < 1.761, conclude Ho 
If t* > 1.761, conclude Н, 


Since 2* = (24 — 20)/1.549 = 2.58 > 1.761,,we conclude Нл. 


1308 Appendices 


Example 3 


Example 4 


Example 5 


Choose between the alternatives: 
Ho: и = 10 
Н: u #10 
when о/ is to be controlled at .02 and: 
п = 25 Y=5.7 s=8 


We require: 


А 8 
Y} = —— = 1.6 
an J/25 
1(.99; 24) = 2.492 


The decision rule is: 
If |t*| < 2.492, conclude Ho 


If [2% > 2.492, conclude H, 


where the symbol | | stands for the absolute value. Since |t*| = |(5.7 — 10)/1.6| = 
| — 2.69| = 2.69 > 2.492, we conclude H,. 


P-Value for Sample Outcome. The P-value for a sample outcome is the probability that 
the sample outcome could have been more extreme than the observed one when ш = po. 
Large P-values support Ho while small P-values support H,. A test can be carried out by 
comparing the P-value with the specified o risk. If the P-value equals or is greater than the 
specified a, Ho is concluded. If the P-value is less than œ, H, is concluded. 


In Example 2, 1* = 2.58. The P-value for this sample outcome is the probability P{t(14) > 
2.58}. From Table B.2, we find £(.985; 14) = 2.415 and /(.990; 14) = 2.624. Hence, the 
P-value is between .010 and .015. The exact P-value can be found from many statistical 
calculators or statistical computer packages; itis .0109. Thus, fora = .05, H, is concluded. 


In Example 3, t* = —2.69. We find from Table B.2 that the one-sided P-value, P{t(24) < 
—2.69}, is between .005 and .0075. The exact one-sided P-value is .0064. Because the 
test is two-sided and the г distribution is symmetrical, the two-sided P-value is twice the 
one-sided value, or 2(.0064) = .013. Hence, for a = .02, we conclude H,. 


Relation between Tests and Confidence Intervals. There is a direct relation between 
tests and confidence intervals. For example, the two-sided confidence limits (A.60) can be 
used for testing: 


Ho: u = цо 

Ha: u + Ho d 
If ию is contained within the 1 — o confidence interval, then the two-sided decision rule in 
Table A.1a, with level of significance o, will lead to conclusion Ho, and vice versa. If pto is 
not contained within the confidence interval, the decision rule will lead to H,,, and vice versa. 


There are similar correspondences between one-sided confidence intervals and one-sided 
decision rules. 


Appendix A Some Basic Results in Probability and Statistics 1309 


A.7 Comparisons of Two Population Means—Normal Populations 


Independent Samples 
There are two normal populations, with means ш; and џи, respectively, and with the same 
standard deviation с. The means и and u3 are to be compared on the basis of independent 
samples for each of the two populations: 


Sample 1: Y;, ..., Yn, 
Sample 2: Zi, ..., Zn 


Estimators of the two population means are the sample means: 


ӯ 231 (А.62а) 
ny 

2—=?4 (А.62Ь) 
n2 


and an estimator of ш: — u5is Y — Z. 
An estimator of the common variance o? is: 


2 ХИ - TY ку, 2) 


.63 
п + пә — 2 (А ) 


and an estimator of o?(Y — 7}, the variance of the sampling distribution of Y — Z, is: 


E ш 1 1 
s{¥ – 2) = dts + z) (A.64) 
ni n, 
We have: 
ys zy (m -— 
‹ 15 ба и) is distributed as г with n; + n2 — 2 degrees of 
{У – Z} (А.65) 


freedom when the two independent samples come from normal populations 
with the same standard deviation. 


Interval Estimation. Тһе confidence limits for и; — u2 with confidence coefficient 1 — a 
are obtained by means of (A.65): 


(Y — Z) £t(1 — 0/2; ni +m — 2)stY — Z} (A.66) 
Example 6 Obtain а 95 percent confidence interval for p, — иә when: 
m=10 ff-14 Mx-Yy-105 


m=20 Z=8  M(z-Zy-24 


1310 Appendices . 


TABLE A.2 
Decision Rules 
for Tests 
Concerning 
Means ш and 
ро of Two 
Normal 
Populations 
(о == 02 =0)— 
Independent 
Samples. 


Example 7 


Alternatives ^ ` Decision Rule ` 
(a) 
Ho: ра = fe If t] < t(1 — 0/2; т + по — 2), conclude Ho 
“Нат рл ЖШ? IF [£*[ >К1:—о/2; m +m — 2), conclude На. 
where: 
Y-Z 
Ü—SY-Z 
(b) 
Hoipi mp2 —— Mf о; т + п — 2), conclude Но 
На рл < 12 Ft < tla; m+n — 2), conclude Ha 
(с) | 
Но: p1 X и If < 1 о; m +m — 2), conclude Ho 
Ha: fi > рә If & > t(1 о; m +m = 2), conclude Ha 
We require: 
105 4- 224 NS 
2 = —————. = 1175 Y — 7} 21.328 
104202 11" ж. 
= 1 1 
?tY 7} = 11.75( — + — | = 1.7625 t(.975;28) = 2.048 
эў — 2) (; T x) (.975;28) 


Hence, the 95 percent confidence interval for 41 — 142 is: 
3.3 = (14 — 8) — 2.048(1.328) < ш — u2 < (14 —8) + 2.048(1.328) = 8.7 


Tests. One-sided and two-sided tests concerning 14; — шә are constructed by means of 

(A.65). Table A.2 contains the decision rules for three possible cases, based on the test 

Statistic: 

"E eat 
О s{¥ — Z} 

with the risk of making a Type I error controlled at о. 


(A.67) 


* 


Choose between the alternatives: 
Ho: ш = u2 
Ha: ил # и2 


when o is to be controlled at .10 and the data are those of Example 6. We require 
t(.95; 28) = 1.701, so that the decision rule is: 


If |t*| < 1.701, conclude Ho 
If |£t*| > 1.701, conclude Н, 
Since |t*| = |(14 — 8)/1.328| = |4.52| = 4.52 > 1.701, we conclude H,. 


Appendix A Some Basic Results in Probability and Statistics 1311 


The one-sided P-value here is the probability P{t(28) > 4.52}. We see from Table B.2 
that this P-value is less than .0005; the exact one-sided P-value is .00005. Hence, the 
two-sided P-value is .0001. For œ = .10, the appropriate conclusion therefore is Нл. 


Paired Observations 


When the observations in the two samples are paired (e.g., attitude scores Y; and Z; for the 
ith sample employee before and after a year's experience on the job), we use the differences: 


W,2YX-Z; i=l,...,n (A.68) 


in the fashion of a sample from a single population. Thus, when the W; can be treated as 
observations from a normal population, we have: 


йй — (u, — 
LIAC is distributed as t with n — 1 degrees of freedom when 
5 


the differences W; can be considered to be observations from a normal 
population and: (A.69) 


п— 1 


А.8 Inferences about Population Variance— Normal Population 


When sampling from a normal population, the following holds for the sample variance s?, 
where s is defined in (A.58b): 


п — 1)5? 
pe is distributed as x? with n — 1 degrees of freedom when the (A.70) 


random sample is from a normal population. 


Interval Estimation 


The lower confidence limit L and the upper confidence limit U in a confidence interval for the 
population variance o? with confidence coefficient 1 — o are obtained by means of (A.70): 
(n — Ds? (n — 1s 


MT x10 —o[2; n — 1) = 00/2; n — 1) (A.71) 


Example 8 Obtain a 98 percent confidence interval for o?, using the data of Example 1 (n = 10, = 4). 
— We require: 


5? = 16 х2(.01; 9) = 2.09 х2(.99; 9) = 21.67 
The 98 percent confidence interval for o? therefore is: 


906 . , 906 
662 202 ев? оа 
2167 =" $ 399 — 69? 


Tests 
One-sided and two-sided tests concerning the population variance o? are constructed by 
means of (A.70). Table A.3 contains the decision rules for three possible cases, with the 
risk of making a Type I error controlled at o. 


1312 Appendices 


TABLE A.3 


Decision Rules Alternatives Decision Rule 
for Tests (a) 
Concerning (n— 1)s? 
Variance co? Но: o? = of If x^(o/2; п 1) < ——34— & х2(1 —°/2; n— 1), 
00 
is bus и Ha: 0? # сё conclude Ho 
орно: Otherwise conclude H, 
(b) 
n—1)s? 
Ho: о? > 02 If ( 5 £ > х2(0; n — 1), conclude Ho 
“oF 
п — 1)52 
Ha: о? < оў If ( 0 < x(a; п— 1), conclude Ha 
à % 
: (9 
п 2 
Ho: o? < o2 If ia <x? — a; n — 1), conclude Ho 
00 1 
n—1)s? 
Ha: о? > o2 If ( n > x*(1 — o; n— 1), conclude Ha 
% 
Comment 


A.9 


The inference procedures about the population variance described here are very sensitive to 
the assumption of a normal population, and the procedures are not robust to departures from 
normality. m 


Comparisons of Two Population Variances—Normal 
Populations 


Independent samples are selected from two normal populations, with means and vari 
ances и and o? and uz and оў, respectively. Using the notation of Section A.7, the two 
sample variances are: 


(Y; — Y? 
= ме (А.72а) 
ge 
(Zi = 2» 
52 = Ir (A.72b) 
mem 
We have: 
Es 
o E E is distributed as F (nı — 1, n2 — 1) when the two independent (A.73) 


samples come from normal populations. 


Appendix A Some Basic Results in Probability and Statistics 1313 


Interval Estimation 


Example 9 


Tests 


TABLE A.4 
Decision Rules 
for Tests 
Concerning 
Variances o? 
and 02 of Two 
Normal 
Populations— 
Independent 
Samples. 


The lower and upper confidence limits L and U for 02/02 with confidence coefficient 1 — o 
are obtained by means of (A.73): 


1 
= Рели 
55 | F(1 —a/2;n; — 1, пә — 5 | 
4 DL (A.74) 
5 
U = 4 | 
52 |= ny — 1, n — jl 
Obtain a 90 percent confidence interval for 02/02 when the data are: 


m=16 m=21 s?=542 852 = 17.8 


We require: 
Е(.05; 15, 20) = 1/F(.95; 20, 15) = 1/2.33 = .429 
F(.95; 15, 20) = 2.20 


The 90 percent confidence interval for o? /02 therefore is: 


14,22 (1 92 „542 [_1 E 
“17.8 \2.20) ~ o2  118V429] ` 


One-sided and two-sided tests concerning 02/02 are constructed by means of (A.73). 
Table A.4 contains the decision rules for three possible cases, with the risk of making 
a Type I error controlled at o. 


Alternatives Decision Rule 
Ho: оё — 02 If F(o/2; m —1, т —1) 557 
E ) s 
Ha: of # oF S F(1 - a]2; ny — 3, riz — 1), conclude Ho 
Otherwise concludé Ha 
(b) 
Ho: o2 > 02 If 3I F (à; th. 3. n2 — 1), cóncludé: Ho 
| 52 | 
Hg: оў < 02 Li ux Е (o; rh.— Т, n —'1), condliide Ha, 
155. | 
(9 
Ноо? <02 tS < РО: арт: - 4, riz — 1), conclüde Ho 
Я 2. 2 р 
Ha 02 > oF tS > FUL 0; т. = 1,7 ~ 1),;concluide Ha 


1314 Appendices 


Example 10 


Choose between the alternatives: 
Ho:0? =0} Heap $0; 
when о/ is to be controlled at .02 and the data are those of Example 9. 
We require: 
F(.01; 15, 20) = 1/F(.99;20, 15) = 1/3.37 = .297 
F(.99;15, 20) = 3.09 


The decision rule is: 
2 


AY 

If 297 < 4 < 3.09, conclude Ho 
52 

Otherwise conclude H, 


Since 52/52 = 54.2/17.8 = 3.04, we conclude Ho. 
Comment T 


The inference procedures about the ratio of two population variances described here are very sensi- 
tive to the assumption of normal populations, and the procedures are not robust to departures from 
normality. m 


Appendix D 


Tables 


1315 


1316 Appendices 


TABLE B.1 Cumulative Probabilities of the Standard Normal Distribution. 


Entry is area A under the standard normal curve from —оо to z( A) 


z(A) 


.00 .01 .02 .03 .04 .05 06 .07 .08 .09 


N 


NS em) «mh amd amd omò amd amd oò жыў шз} 
о обо мб л момо осо мб ош Комо 


Selected Percentiles 


Cumulative probability A: .90 95 975 98 99 995 999 
2(А): 1.282 1.645 1.960 2.054 2.326 2.576 3.090 


Appendix B Tables 1317 


TABLE B.2 
Percentiles 
of the t 
Distribution. 


Entry is t(A; v) where P{t(v) < t(A;v)} =A 


Uv 60 70 .80 85 90 95 975 
1 0.325 0.727 1.376 1.963 3.078 6.314 12.706 
2 0.289 0.617 1.061 1.386 1.886 2.920 4.303 
3 0.277 0.584 0.978 1.250 1.638 2.353 3.182 
4 0.271 0.569 0.941 1.190 1.533 2.132 2.776 
5 0.267 0.559 0.920 1.156 1.476 2.015 2.571 
6 0.265 0.553 0.906 1.134 1.440 1.943 2.447 
7 0.263 0.549 0.896 1.119 1.415 1.895 2.365 
8 0.262 0.546 0.889 1.108 1.397 1.860 2.306 
9 0.261 0.543 0.883 1.100 1.383 1.833 2.262 
10 0.260 0.542 0.879 1.093 1.372 1.812 2.228 
11 0.260 0.540 0.876 1.088 1.363 1.796 2.201 
12 0.259 0.539 0.873 1.083 1.356 1.782 2.179 
13 0.259 0.537 0.870 1.079 1.350: 1.771 2.160 
14 0.258 0.537 0.868 1.076 1.345 1.761 2.145 
15 0.258 0.536 0.866 1.074 1.341 1.753 2.131 
16 0.258 0.535 0.865 1.071 1.337 1.746 2.120 
17 0.257 0.534 0.863 1.069 1.333 1.740 2.110: 
18 0.257 0.534 0.862 1.067 1.330 1.734 2.101 
19 0.257 0.533 0.861 1.066 1.328 1.729 2.093 
20 0.257 0.533 0.860 1.064 1.325 1.725 2.086 
21 0.257 0.532 0.859 1.063 1.323 1.721 2.080 
22 0.256 0.532 0.858 1.061 1.321 1.717 2.074 
23 0.256 0.532 0.858 1.060 1.319 1.714 2.069 
24 0.256 0.531 0.857 1.059 1.318 1.711 2.064 
25 0.256 0.531 0.856 1.058 1.316 1.708 2.060 
26 0.256 0.531 0.856 1.058 1.315 1.706 2.056 
27 0.256 0.531 0.855 1.057 1.314 1.703 2.052 
28 0.256 0.530 0.855 1.056 1.313 1.701 2.048 
29 0.256 0.530 0.854 1.055 1.311 1.699 2.045 
30 0.256 0.530 0.854 1.055 1.310 1.697 2.042 
40 0.255 0.529 0.851 T.050 1.303 1.684 2.021 
60 0.254 0.527 0.848 1.045 1.296 1.671 2.000 
120 | 0.254 0.526 0.845 1.041 1.289 1.658 1.980 


co 0.253 0.524 0.842 1.036 1.282 1.645 1.960 


1318 Appendices 


TABLE B.2 
(concluded ) 
Percentiles 
of the t 
Distribution. 


и .98 .985 .99 .9925 .995 .9975 .9995 
1 15.895 21.205 31.821 42.434 63.657 127.322 636.590 
2 4.849. 5.643 6.965 8.073 9.925 - 14.089 31.598 
3 3.482 ® 3.896 4.541 5.047 5.841 7.453 12.924 
4 2.999 3.298 · 3.747 4.088 4.604 5.598 8.610 
5 2.757 3.003 3.365 3.634 4.032 4.773 6.869 
6 2.612 2.829 3.143 3.372 3.707 4.317 5.959 
7 2.517 2.715 2.998 3.203 3.499 4.029 5.408 
8 2.449 2.634 2.896 3.085 3.355 3.833 5.041 
9 2.398 2.574 2.821 2.998 3.250 3.690 4.781 
10 2.359 2.527 2.764 2.932 3.169 3.581 4.587 
11 2.328 2.491 2.718 2.879 3.106 3.497 4.437 
12 2.303 2.461 2.681 2.836 3.055 | 3.428 4.318 
13 2.282 2.436 2.650 2.801 3.012 3.372 4.221 
14 2.264 2.415 2.624 2.771 2.977 3.326 4.140 
15 2.249 2.397 2.602 2.746 2.947 3.286 4.073 
16 2.235 2.382 2.583 2.724 2.921 3.252 4.015 
17 2.224 2.368 2.567 2.706 2.898 3.222 3.965 
18 2.214 2.356 2.552 2.689 2.878 3.197 3.922 
19 2.205 2.346 2.539 2.674 2.861 3.174 3.883 
20 2.197 2.336 2.528 2.661 2.845 3.153 3.849 
21 2.189 2.328 2.518 2.649 2.831 3.135 3.819 
22 2.183 2.320 2.508 2.639 2.819 3.119 3.792 
23 2.177 2.313 2.500 2.629 2.807 3.104 3.768 
24 2.172 2:307 2.492 2.620 2.797 3.091 3.745 
25 2.167 2.301 2.485 2.612 2.787 3.078 3.725 
26 2.162 2.296 2.479 2.605 2.779 3.067 3.707 
27 2.158 2.291 2.473 2.598 2.771 3.057 3.690 
28 2.154 2.286 2.467 2.592 2.763 3.047 3.674 
29 2.150 2.282 2.462 2.586 2.756 3.038 3.659 
30 2.147 2.278 2.457 2.581 2.750 3.030. 3.646 
40 2.123 2.250 2.423 2.542 2.704 2.971 3.551 
60 2.099 2.223 2.390 2.504 2.660 2.915 3.460 
120 2.076 2.196 2.358 2.468 2.617 2.860 3.373 


co 2.054 2.170 2.326 2.432 2.576 2.807 3.291 


TABLE B.3 Percentiles of the x? Distribution. 
Entry is х2(А; v) where P{x?(v) € x?(&v)) = A 


xu) 


.005 .010 .025 .050 .100 900 950 975 .990  .995 


0.04393 0.03157 0.03982 0.07393 0.0158 2.71 3.84 5.02 6.63 7.88 
0.0100 0.0201 0.0506 0.103 0.211 4.61 5.99 7.38 9.21 10.60 
0.072 0.115 0.216 0.352 0.584 6.25 7.81 9.35 11.34 12.84 
0.207 0.297 0.484 0.711 1.064 7.78 9.49 11.14 13.28 14.86 


0.412 0.554 0.831 1.145 1.61 9.24 11.07 12.83 15.09 16.75 
0.676 0.872 1.24 1.64 2.20 10.64 12.59 14.45 16.81 18.55 
0.989 1.24 1.69 2.17 2.83 12.02 14.07 16.01 18.48 20.28 
1.34 1.65 2.18 2.73 3.49 13.36 15.51 17.53 20.09 21.96 
1.73 2.09 2.70 3.33 4.17 14.68 16.92 19.02 21.67 23.59 


2:16 2.56 3.25 3.94 4.87 15.99 18.31 20.48 23.21 25.19 
2.60 3.05 3.82 4.57 5.58 17.28 19.68 21.92 24.73 26.6 
3.07 3.57 4.40 5.23 6.30 18.55 21.03 23.34 26.22 28.30 
3.57 4.11 -5.01 5.89 7.04 19.81 22.36 24.74 27.69 29.82 
4.07 4,66 5.63 6.57 7,79 21.06 23.68 26.12 29.14 31.32 


4.60 5.23 6.26 7.26 8.55 22.31 25.00 27.49 30.58 32.80 
5.14 5.81 6.91 7.96 9.31 23.54 26.30 28.85 32.00 34.27 
5.70 6.41 7.56 8.67 10.09 24.77 27.59 30.19 33.41 35.72 
6.26 7.01 8.23 9.39 10.86 25.99 .28.87 31.53 34.81 37.16 
6.84 7.63 8.91 10.12 11.65 27.200 30.14 32.85 36.19 38.58 


7.43 8.26 9.59 10.85 12.44 28.41 31.41 234.17 37.57 40.00 
8.03 8.90 10.28 11.59 13.24 29.62 32.67 35.48 38.93 41.40 
8.64 9.54 10.98 12.34 14.04 30.81 33.92 36.78 40.29 42.80 
9.26 10.20 11.69 13.09 14.85 32.001 35.17 38.08 41.64 44.18 
9.89 10.86 12.40 13.85 15.66 33.20 36.42 39.36 42.98 45.56 


10.52 11.52 13.12 14.61 16.47 34.38 37.65 40.65 44.31 46.93 
11.16 12.20 13.84 15.38 17.29 35.56 38.89 41.92 45.64 48.29 
11.81 12.88 14.57 16.15 18.11 36.74 40.11 43.19 46.96 49.64 
12.46 13.56 15.31 16.93 18.94 37.92 41.34 44.46 48.28 50.99 
13.12 14.26 16.05 17.71 19.77 39.09 42.56 45.72 49.59 52.34 


13.79 14.95 16.79 18.49 20.60 40.26: 43.77 46.98 50.89 53.67 


ш — на lo lol ol х 
юо моол Бом 0 Моо мчс ыл dS UON 


WNNNNN NNNNN 
о о обом AWM AWN= 0O 


40 | 20.71 22.16 24.43 26.51 29.05 51.81 55.76 59.34 63.69 66.77 
50 | 27.99 . 29.71 32.36 34.76 37.69 63.17 67.50 71.42 76.15 79.49 
60 | 35.53 = 3748 40.48 43.19 46.46 74.40 79.08 83.30 88:38 91.95 
70 | 43.28. 45.44 48.76 51.74 55.33 85.53 90.53 95.02 100.4 104.2 
80 | 51.17 53.54 57.15 60.39 64.28 96.58 101.9 106.6 112.3 116.3 
90 | 59.20 61.75 65.65 69.13 73.29 107.6 113.1 118.1 124.1 128.3 


100 | 67.33 70.06 74.22 77.93 82.36 118.5 124.3 129.6 135.8 140.2 


Source: Reprinted, with permission, from С. M. Thompson, “Table of Percentage Points of the Chi-Square Distribution,” Biometrika 32 (1941), рр. 188-89. 


1319 


1320 Appendices 


TABLE B.4  Percentiles of the F Distribution. 


Entry is F(A Vis vj) where P{F (А, v2) < F(A; 1л, v3)) —A 


F(A; Vi, V2) 
. SO 1 
F(A; Us V2) ad F(1 nw A; v, n) 


Appendix B Tables 1321 


TABLE B.4 (continued ) Percentiles of the F Distribution. 


Numerator df 
5 


1 2 3 4 6 7 8 9 


1 .50 1.00 1.50 1.71 1.82 1.89 1.94 1.98 2.00 2.03 
.90 39.9 49.5 53.6 55.8 57.2 58.2 58.9 59.4 59.9 
.95 161 200 216 225 230 234 237 239 241 
.975 648 800 864 900 922 937 948 957 963 
.99 4,052 5,000 5,403 5,625 5,764 5,859 5,928 5,981 6,022 
.995 16211 20,000 21,615 22,500 23,056 23,437 23,715 23,925 24,091 
-999 | 405,280 500,000 540,380 562,500 576,400 585,940 592,870 598,140 602,280 
2 .50 0.667 1.00 1.13 1.21 1.25 1.28 1.30 1.32 1.33 
.90 8.53 9.00 9.16 9.24 9.29 9.33 9.35 9.37 9.38 
.95 18.5 19.0 19.2 19.2 19.3 19.3 19.4 19.4 19.4 
.975 38.5 39.0 39.2 39.2 39.3 39.3 39.4 39.4 39.4 
.99 98.5 99.0 99.2 99.2 99.3 99.3 99.4 99.4 99.4 
.995 199 199 199 199 199 199 199 199 199 
.999 998.5 999.0 999.2 999.2 999.3 999.3 999.4 999.4 999.4 
3 .50 0.585 0.881 1.00 1.06 1.10 1.13 1.15 1.16 1.17 
.90 5.54 5.46 5.39 5.34 5.31 5.28 5.27 5.25 5.24 
95 10.1 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 
975 17.4 16.0 15.4 15.1 14.9 14.7 14.6 14.5 14.5 
.99 34.1 30.8 29.5 28.7 28.2 27.9 27.7 27.5 27.3 
.995 55.6 49.8 47.5 46.2 45.4 44.8 44.4 44.1 43.9 
.999 167.0 148.5 141.1 137.1 134.6 132.8 131.6 130.6 129.9 
4 .50 0.549 0.828 0.941 1.00 1.04 1.06 1.08 1.09 1.10 
.90 4.54 4.32 4.19 4.11 4.05 4.01 3.98 3.95 3.94 
95 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 
975 12.2 10.6 9.98 9.60 9.36 9.20 9.07 8.98 8.90 
.99 21.2 18.0 16.7 16.0 15.5 15.2 15.0 14.8 14.7 
.995 31.3 26.3 24.3 23.2 22.5 22.0 21.6 21.4 21.1 
.999 74.1 61.2 56.2 53.4 51.7 50.5 49.7 49.0 48.5 
5 .50 0.528 0.799 0.907 0.965 1.00 1.02 1.04 1.05 1:06 
.90 4.06 3.78 3.62 3.52 3.45 3.40 3.37 3.34 3.32 
.95 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 
.975 10.0 8.43 7.76 7.39 7.15 6.98 6.85 6.76 6.68 
.99- 16.3 13.3 12.1 11.4 11.0 10.7 10.5 10.3 10.2 
.995 22.8 18.3 16.5 15.6 14.9 14.5 14.2 14.0 13.8 
.999 47.2 37.1 33.2 31.1 29.8 28.8 28.2 27.6 27.2 
6 .50: 0.515 0.780 0.886 0.942 0.977 1.00 1.02 1.03 1.04 
.90 3.78 3.46 3.29 3.18 3.11 3.05 3.01 2.98 2.96 
.95 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 
.975 8.81 7.26 6.60 6.23 5.99 5.82 5.70 5.60 5.52 
.99 13.7 10.9 9.78 9.15 8.75 8.47 8.26 8.10 7.98 
:995 18.6 14.5 12.9 12.0 11.5 11.1 10.8 10.6 10.4 
.999 35.5 27.0 23.7 21.9 20.8 20.0 19.5 19.0 18.7 
7 .50 0.506 | 0.767 0.871 0.926 0.960 0.983 1.00 1.01 1.02 
.90 3.59 3.26 3.07 2.96 2.88 2.83 2.78 2.75 2.72 
.95 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3:73 3.68 
.975 8.07 6.54 5.89 5.52 5.29 5.12 4.99 4.90 4.82 
.99 12.2 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 
.995 16.2 12.4 10.9 10.1 9.52 9.16 8.89 8.68 8.51 


29.2 21.7 18.8 17.2 16.2 15.5 15.0 14.6 14.3 


1322 Appendices 


TABLE B.4 (continued ) Percentiles of the F Distribution. 


Numerator df 


Den. 
df A 10 12 15 20 24 30 60 120 Go 
1 .50 2.04 2.07 2.09 2.12 2.13 2.15 2.17 2.18 2.20 
.90 60.2 60.7 61.2 61.7 62.0 62.3 62.8 63.1 63.3 
.95 242 244 246 248 249 250 252 253 254 
.975 969 977 985 993 997 1,001 1,010 1,014 1,018 
.99 6,056 6,106 6,157 6,209 6,235 6,261 6,313 6,339 6,366 
.995 24,224 24,426 24,630 24,836 24,940 25,044 25,253 25,359 25,464 
.999 | 605,620 610,670 615,760 620,910 623,500 626,100 631,340 633,970 636,620 
2 .50 1.34 1.36 1.38 1.39 1.40 1.41 1.43 1.43 1.44 
.90 9.39 9.41 9.42 9.44 9.45 9.46 9.47 9.48 9.49 
95 19.4 19.4 19.4 19.4 19.5 19.5 19.5 19.5 19.5 
.975 39.4 39.4 39.4 39.4 39.5 39.5 39.5 39.5 39.5 
99 99.4 99.4 99.4 99.4 99.5 99.5 99.5 99.5 99:5. 
.995 199 199 199 199 199 199 199 199 200 
.999 999.4 9994 9994 9994 9995 9995 999.5 999.5 999.5 
3 .50 1.18 1.20 1.21 1.23 1.23 1.24 1.25 1.26 1.27. 
90 5.23 5.22 5.20 5.18 5.18 5.17 5.15 5.14 5.13 
.95 8.79 8.74 8.70 8.66 8.64 8.62 8.57 8.55 8.53 
975 14.4 14.3 14.3 14.2 14.1 14.1 14.0 13.9 13.9 
99 27.2 27.1 26.9 26.7 26.6 26.5 26.3 26.2 26.1 
995 43.7 43.4 43.1 42.8 42.6 42.5 42.1 42.0 41.8 
.999 129.2 128.3 1274 1264 125.9 1254 1245 124.0 123.5 
4  .50 1.11 1.13 1.14 1.15 1.16 1.16 1.18 1.18 1.19 
.90 3.92 3.90 3.87 3.84 3.83 3.82 3.79 3.78 3.76 
.95 5.96 5.91 5.86 5.80 5.77 5.75 5.69 5.66 5.63 
.975 8.84 8.75 8.66 8.56 8.51 8.46 8.36 8.31 8.26 
.99 14.5 14.4 14.2 14.0 13.9 13.8 13.7 13.6 13.5 
.995 21.0 20.7 20.4 20.2 20.0 19.9 19.6 19.5 19.3 
.999 48.1 47.4 46.8 46.1 45.8 45.4 44.7 44.4 44.1 
5 50 1.07 1.09 1.10 1.11 1.12 1.12 1.14 1.14 1.15 
.90 3.30 3.27 3.24 3.21 3.19 3.17 3.14 3.12 3.11 
95 4.74 4.68 4.62 4.56 4.53 4.50 4.43 4.40 4.37 
975 6.62 6.52 6.43 6.33 6.28 6.23 6.12 6.07 6.02 
99 10.1 9.89 9.72 9.55 9.47 9.38 9.20 9.11 9.02 
.995 13.6 13.4 * 13.1 12.9 12.8 12.7 12.4 12.3 12.1 
999 26.9 26.4 25.9 25.4 25.1 24.9 24.3 24.1 23.8 
6 .50 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.12 
.90 2.94 2.90 2.87 2.84 2.82 2.80 2.76 2.74 2.72 
.95 4.06 4.00 3.94 3.87 3.84 3.81 3.74 3.70 3.67 
.975 5.46 5.37 5.27 5.17 5.12 5.07 4.96 4.90 4.85 
.99 7.87 7.72 7.56 7.40 7.31 7.23 7.06 6.97 6.88 
.995 10.2 10.0 9.81 9.59 9.47 9.36 9.12 • 9.00 8.88 
.999 18.4 18.0 17.6 17.1 16.9 16.7 16.2 16.0 15.7 
7 .50 1.03 1.04 1.05 1.07 1.07 1.08 1.09 1.10 1.10 
.90 2.70 2.67 2.63 2.59 2.58 2.56 2.51 2.49 2.47 
.95 3.64 3.57 3.51 3.44 3.41 3.38 3.30 3.27 3.23 
.975 4.76 4.67 4:57 4.47 4.42 4.36 4.25 4.20 4.14 
.99 6.62 6.47 6.31 6.16 6.07 5.99 5.82 5.74 5.65 
995 8.38 8.18 7.97 7.75 7.65 7.53 7.31 7.19 7.08 
999 14.1 13.7 13.3 12.9 12.7 12.5 12.1 11.9 11.7 


TABLE B.4 


(continued ) Percentiles of the F Distribution. 


Appendix B Tables 1323 


Den. 


df 
8 


10 


12 


15 


20 


24 


0.499 
3.46 
5.32 


11.3 
14.7 
25.4 


0.494 


5.12 
7.21 
10.6 
13.6 


0.490 
3.29 
4.96 
6.94 
10.0 
12.8 
21.0 

0.484 
3.18 
4.75 
6.55 
9.33 
11.8 
18.6 


0.478 
3.07 
4.54 
6.20 
8.68 
10.8 
16.6 


0.472 
2.97 
4.35 
5.87 
8.10 
9.94 
14.8 


0.469 


4.26 
5.72 
7.82 
9.55 


22.9 


2.93. 


14.0 


2 
0.757 
3.11 
4.46 


6.06- 


8.65 
11.0 
18.5 
0.749 
3.01 
4.26 
5.71 
8.02 
10.1 
16.4 


0.743 
2.92 
4.10 
5.46 
7:56 
9.43 
14.9 

0.735 
2.81 
3.89 


5.10 


6.93 
8.51 
13.0 
0.726 
2.70 
3.68 
4.77 


6.36: 


7.70 


11.3: 
0.718. 


2.59 
3.49 
4.46 
5:85 
6.99 
9.95 
0.714 


2.54: 
3.40: 


4.32 
5.61 
6.66 
9.34 


3 
0.860 
2:92 
4.07 
5.42 
7.59 
9.60 
15.8 
0.852 
2.81 
3.86 
5.08 
6.99 
8.72 
13.9 


0.845 
2.73 
3.71 
4.83 
6.55 
8.08 
12.6 

0.835. 
2:61 
3.49. 
4.47 


5.95 


7.23 
10.8 
0.826 
2.49 
3.29 


4.15. 


5.42 
6.48 
9.34 


0.816. 


2.38 
3.10 
3.86 
4.94 
5.82 
8.10 


0.812 
2.33 


3.07 


3.72 
4.72 
5.52 
7.55 


Numerator df 
4 5 
0.915 0.948 
2.81 2.73 
3.84 3.69 
5.05 4.82 
7.01 6.63: 
8.81 8.30 
14.4 13.5 
0.906 0.939- 
2.69 2.61 
3.63 3.48 
4.72 4.48- 
6.42 6.06- 
7.96 ‚7.47 
12.6 11.7 
0.899 0.932 
261 2:52 
3.48 3:33 
4.47 424 
-5.99 5.64 
7.34. 6.87 
11.3 10.5 
0.888 0.921 
2:48 2.39 
-3.26 3.11 
4.12 3.89 
5.41 5:06 
6.52 6.07 
9.63 8:89 
0.878 0.911 
2.36 2.27 
- 3.06 2.90 
3.80 3.58 
4.89 4.56 
5:80 5.37 
8.25 7.57 
0.868 0.900: 
2.25 2.16 
2.87 2.71 
3.51 3.29 
4.43 4.10 
547 4.76. 
7.10 6.46 
0.863 0.895 
2.19 2.10 
2.78 2.62 
3.38 3.15 
4.22 3.90 
4.89 4.49 
6.59 5.98 


6 


0.971 
2.67 
3.58 
4.65 
6.37 
7.95 
12.9 


0.962 
2.55 
3.37 
4.32 
5.80 
7.13 
11.1 

0.954 
2.46 
3.22 
4.07 
5.39 


6.54. 


9.93. 
0.943 
2.33 
3.00 


3.73 


4.82 


5.76. 


8.38 


0.933 
2.21 
2.79 
3.41 
4.32 
5.07 
7.09 


0.922. 


2.09 


2.60. 


3.13 
3.87 
4.47 
6.02 
0.917 
2.04 
2.51 
2.99 
3.67 
4.20 
5.55 


0.988 
2.62 
3.50 


6.18 
7.69 
12.4 
0.978 


3.29 
4.20 
5.61 
6.88 


10.7 


0.971 
2.41 
3.14 
3.95 
5.20 


6.30 


9.52 


0.959: 


2.28 
2.91 
3.61 
4.64 
5.52 
8.00 
0.949 
2.16 
2.71 
3.29 
4.14 
4.85 
6.74 


0.938 
2.04 
2.51 
3.01 
3.70 
4.26 
5.69 

0.932 
1.98 
2.42 


2.87. 
3.50 


3.99 
5:23 


1.00 
2:59 
3.44 


6.03 
7.50 


12.0 


0.990 


3.23 
4.10 


6.69 
10.4 


0.983 


3.85 
5.06 


9.20 
0.972 


3.51 
4.50 


7.71 
0.960 
2.2 
2.64 
3.20 
4.00 
4.67 
6.47 
0.950 
2.00 
2.45 
2.91 
3.56 
4.09 
5.44 


0.944 


1.94 


2.36 


2.78 


3.36 


3.83. 
4.99. 


1324 Appendices 


TABLE B.4 (continued ) Percentiles of the F Distribution. 


Numerator df 


Den. 
df A 10 
8 .50 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.08 1.09 
.90 2.54. 2.50 2.46 2.42 2.40 2.38 2.34 2.32 2.29 
.95 3.35 3.28 3.22 3.15 3.12 3.08 3.01 2.97 2.93 
.975 4.30 4.20 4.10 4.00 3.95 3.89 3.78 3.73 3.67 
.99 5.81 5.67 5.52 5.36 5.28 5.20 5.03 4.95 4.86 
.995 7.21 7.01 6.81 6.61 6.50 6.40 6.18 6.06 5.95 
.999 ‚11.5 11.2 10.8 10.5 10.3 10.1 9.73 9.53 9.33 
9 .50 1.01 1.02 1.03 1.04 1.05 1.05 1.07 1.07 1.08 
.90 2.42 2.38 2.34 2.30 2.28 2.25 2.21 2.18 2.16 
.95 3.14 3.07 3.01 2.94 2.90 2.86 .2.79 2.5 2.71 
.975 3.96 3.87 3.77 3.67 3.61 3.56 3.45 3.39 3.33 
.99 5.26 5.11 4.96 4.81 4.73 4.65 4.48 4.40 4.31 
.995 6.42 6.23 6.03 5.83 5.73 5.62 5.41 5.30 5.19 
.999 9.89 9.57 9.24 8.90 8.72 8.55 8.19. 8.00 7.81 
10 ..50 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.06 1.07 
.90 2.32 2.28 2.24 2.20 2.18 2.16 2.11 2.08 2.06 
.95 2.98 2.91 2.84 2.77 2.74 2.70 2.62 2.58 2.54 
.975 3.72 3.62 3.52 3.42 3.37 3.31 3.20 3.14 3.08 
99 4.85 4.71 4.56 s 4.41 4.33 4.25 4.08 4.00 3.91 
.995 5.85 5.66 5.47 5.27 5.17 5.07 4.86 4.75 4.64 
.999 8.75 8.45 8.13 7.80 7.64 7.47 7.12 6.94 6.76 
12 .50 0.989 1.00 1.01 1.02 1.03 1.03 1.05 1.05 1.06 
.90 2.19 2.15 2.10 2.06 2.04 2.01 1:96 1.93 1.90 
.95 2.75 2.69 2.62 2.54 2.51 2.47 2.38 2.34 2.30 
.975 3.37 3.28 3.18 3.07 3.02 2.96 2.85 2.79 2.72 
99 4.30 416 4.01 3.86 3.78 3.70 3.54 3.45 3.36 
.995 5.09 4.91 4.72 4.53 4.43 4.33 4.12 4.01 3.90 
.999 7.29 7.00 6.71 6.40 6.25 6.09 5.76 5.59 5.42 
15 .50 0.977 0.989 1.00 1.01 1.02 1.02 1.03 1.04 1.05 
.90 2.06 2.02 1.97 1.92 1.90 1.87 1.82 1.79 1.76 
.95 2.54 2.48 2.40 2.33 2.29 2.25 2.16 2.11 2.07 
.975 3.06 2.96 2.86 2.76 2.70 2.64 2.52 2.46 2.40 
.99 3.80 3.67 3.52 3.37 3.29 3.21 3.05 2.96 2.87 
.995 4.42 4.25 4.07 3.88 3.79 3.69 3.48 3.37 3.26 
.999 6.08 5.81 5.54 5.25 5.10 4.95 4.64 4.48 4.31 
20 .50 0.966 0.977 0.989 1.00 1.01 1.01 1.02 1.03 1.03 
.90 1.94 1.89 1.84 1.79 1.77 1.74 1.68 1.64 1.61 
.95 2.35 2.28 2.20 2.12 2.08 2.04 1.95 1.90 1.84 
.975 2.77 2.68 2.57 2.46 2.41 2.35 2.22 2.16 2.09 
.99 3.37 3.23 3.09 2.94 2.86 2.78 2.61 2.52 2.42 
.995 3.85 3.68 3.50 3.32 3.22 3.12 2.92 2.81 2.69 


.999 5.08 4.82 4.56 4.29 4.15 4.00 3.70 3.54 3.38 
24 .50 0.961 0.972 0.983 0.994 1.00 1.01 1.02 4.02 1.03 


-90 1.88 1.83 1.78 1.73 1.70 1.67 1.61 1.57 1.53 
.95 2.25 2.18 2.11 2.03 1.98 1.94 1.84 1.79 1.73 
.975 2.64 2.54 2.44 2.33 2.27 2.21 2.08 2.01 1.94 
.99 3.17 3.03 2.89 2.74 2.66 2.58 2.40 2.31 2.21 
.995 3.59 3.42 3.25 3.06 2.97 2.87 2.66 2.55 2.43 


.999 4.64 4.39 4.14 3.87 3.74 3.59 3.29 3.14 2.97 


Appendix B Tables 1325 


TABLE B.4 (continued) Percentiles of the F Distribution. 


Numerator df 


Den. 

df A 3 4 5 6 7 8 9 

30 .50 0.807 0.858 0.890 0.912 0.927 0.939 0.948 
.90 2.28 2.14 2.05 1.98 1.93 1.88 1.85 
.95 2.92 2.69 2.53 2.42 2.33 2.27 2.21 
.975 3.59 3.25 3.03 2.87 2.75 2.65 2.57 
.99 4.51 4.02 3.70 3.47 3.30 3.17 3.07 
.995 5.24 4.62 4.23 3.95 3.74 3.58 3.45 
.999 7.05 6.12 5.53 5.12 4.82 4.58 4.39 

60 .50 0.798 0.849 0.880 0.901 0.917 0.928 0.937 
.90 2.18 2.04 1.95 1.87 1.82 1.77 1.74 
95 2.76 2.53 2.37 2.25 2.17 2.10 2.04 
975 3.34 3.01 2.79 2.63 2.51 2.41 2.33 
99 4.13 3.65 3.34 3.12 2.95 2.82 2.72 
„995 4.73 4.14 3.76 3.49 3.29 .3.13 3.01 
.999 6.17 5.31 4.76 4.37 4.09 3.86 3.69 

120 .50 0.793 0.844 0.875 0.896 0.912 0.923 0.932 
.90 2.13 1.99 1.90 1.82 1.77 1.72 1.68 
.95 2.68 2.45 2.29 2.18 2.09 2.02 1.96 
.975 3.23 2.89 2.67 2.52 2.39 2.30 2.22 
.99 3.95 3.48 3.17 2.96 2.79 2.66 2.56 
.995 4.50 3.92 3.55 3.28 3.09 2.93 2.81 
.999 5.78 4.95 4.42 4.04 3.77 3.55 3.38 

co .50 0.789 0.839 0.870 0.891 0.907 0.918 0.927 
.90 2.08 1.94 1.85 1.77 1.72 1.67 1.63 
.95 2.60 2.37 2.21 2.10 2.01 1.94 1.88 
.975 3.12 2.79 2.57 2.41 2.29 2.19 2.11 
.99 3.78 3.32 3.02 2.80 2.64 2.51 2.41 
.995 4.28 3.72 3.35 3.09 2.90 2.74 2.62 
.999 5.42 4.62 4.10 3.74 3.47 3.27 3.10 


1326 Appendices 


TABLE B.4 (concluded ) Percentiles of the F Distribution. 


Numerator df 


Den. 
df A 


30 .50 


60 .50 


120 .50 


Source: Reprinted from Table 5 of Pearson and Hartley, Biometrika Tables for Statisticians, Volume 2, 1972, published by the Cambridge University Press, on behalf 
of The Biometrika Society, by permission of the authors and publishers. 


a 


Appendix B 7ables 1327 


TABLE B.5 
Power Values 
for Two-Sided 
t Test. 


б. 
= 


© боб мо UC dS UON M 


1328 Appendices 


TABLE B.5 
(concluded ) 
Power Values 
for Two-Sided 
t Test. 


a 
= 


CON A UAWN- 


1.00 
05 24 59 .87 .98 1.00 1.00 1.00 1.00 
05 24 59 .88 98 1.00 1.00 1.00 1.00 
05 24 59 .88 98 1.00 1.00 1.00 1.00 
05 24 59 .88 98 1.00 1.00 1.00 1.00 
05 24 60 .88 98 1.00 1.00 1.00 1.00 
05 =, 25 60 .88 98 1.00 1.00 1.00 1.00 
05 25 60 88 98 1.00 1.00 1.00 1.00 
.05 .26 .62 .90 .99 1.00 1.00 1.00 1.00 
05 26 63 .90 99 1.00 1.00 1.00 1.00 
05 26 63 .91 99 1.00 1.00 1.00 1.00 
.06 .27 -65 91 99 1.00 1.00 1.00 1.00 
.06 27 .65 .91 .99 1.00 1.00 1.00 1.00 
.06 .28 .66 .92 .99 1.00 1.00 1.00 1.00 


Appendix В Tables 1329 


Mir HN Level of Significance о: 
for Coefficient n ло .05 .025 01 005 
of Correlation 5 .903 .880 .865 .826 :807 
betwenn 6 910 .888 .866 .838 .820 
Ordered 7 .918 .898 .877 .850 .828 
Residuals and 8 .924 .906 .887 .861 .840 
Expected 9 .930 .912 .894 871 854 
Values under р 
Normality 10 .934 .918 901 879 862 
sken 12 942 .928 912 .892 876 
Distribution of 14 .948 935 .923 .905 .890 
Error Terms 16 5993... LL (QUAD -= .929 .913 .899 
ie Normal 18 957 1946 .935 .920 .908 
20 .960 951 940 926 916 
22 963 954 945 933 923 
24 965 957 949 937 927 
26 967 960 952 941 .932 
28 .969 .962 955 944 936 
30 971 .964 .957 947 939 
40 977 972 .966 .959 .953 
50 .981 .977 .972 .966 .961 
60 .984 .980 .976 971 .967 
70 .986 .983 .979 .975 971 
80 .987 985 . .982 .978 975 
90 988 986 984 .980 977 
100 989 987 .985 .982 .979 


Source: Reprinted, with permission, from S. W. Looney and T. R. Gulledge, Jr., “Use of the Correlation Coefficient with Normat 
Probability Plots," The American Statistician 39 (1985), pp. 75—79. 


1330 Appendices 


TABLE B.7 
Durbin-Watson 
Test Bounds. 


Level of Significance a = .05 


p-i- р-1= 2 
d, dy d, dy 
1.08 1.36 0.95 1.54 
1.10 1.37 0.98 1.54 
1.13 1.38 1.02 1.54 
1.16 1.39 1.05 1.53 
1.18 1.40 1.08 1.53 
1.20 1.41 1.10 1.54 
1.22 1.42 1.13 1.54 
1.24 1.43 1.15 1.54 
1.26 1.44 1.17 1.54 
1.27 1.45 1.19 1.55 
1.29 1.45 1.21 1.55 
1.30  1.46* 1.22 1.55 
1.32 1.47 1.24 1.56 
1.33 1.48 1.26 1.56 
1.34 1.48 1.27 1.56 
1.35 1.49 1.28 1.57 
1.36 1.50 1.30 1.57 
1.37 1.50 1.31 1.57 
1.38 1.51 1.32 1.58 
1.39 1.51 1.33 1.58 
1.40 1.52 1.34 1.58 
1.41 1.52 1.35 1.59 
1.42 1.53 1.36 1.59 
1.43 1.54 1.37 1.59 
1.43 1.54 1.38 1.60 
1.44 1.54 1.39 1.60 
1.48 1.57 1.43 1.62 
1.50 1.59 1.46 1.63 
1.53 1.60 1.49 1.64 
1.55 1.62 1.51 1.65 
1.57 1.63 1.54 1.66 
1.58 1.64 1.55 1.67 
1.60 1.65 1.57 1.68 
1.61 1.66 1.59 1.69 
1.62 1.67 1.60 1.70 
1.63 1.68 1.61 1.70 
1.64 1.69 1.62 1.71 
1.65 1.69 1.63 1.72 


р-1=3 p-1=4 p-1-5 
d dy d, dy d, dy 
0.82 1.75 0.69 1.97 0.56 221 
0.86 173 0.74 1.93 0.62 215 
0.90 1.71 0.78 1.90 0.67 2.10 
0.93 1.69 0.82 1.87 0.71 2.06 
0.97 1.68 0.86 1.85 0.75 2.02 
1.00 1.68 0.90 1.83 0.79 1.99 
1.08 1.67 0.93 1.81 0.83 1.96 
1.05 1.66 0.96 1.80 0.86 1.94 
1.08 1.66 0.99 1.79 0.90 1.92 
1.100 1.66 1.01 1.78 0.93 1.90 
1.12 1.66 1.04 1.77 0.95 1.89 
1.14 1.65 1.06 1.76 0.98 1.88 
116 1.65 1.08 1.76 1.01 1.86 
1.18 1.65 1.10 1.75 1.03 1.85 
1.20 1.65 1.12- 1.74 1.05 1.84 
1.21 1.65 1.14 1.74 1.07 1.83 
1.23 1.65 116 1.74 1.09 1.83 
1.24 1.65 1.18 1.73 111 1.82 
1.26 1.65 1.19 1.73 1.13 1.81 
1.27 1.65 1.21 1.73 1.15 1.81 
1.28 1.65 1.22 1.73 1.16 1.80 
1.29 1.65 1.24 1.73 1.18 1.80 
1.31 1.66 1.25 1.72 1.19 1.80 
1.32 1.66 1.26 1.72 1.21 1.79 
1.33 1.66 1.27 1.72 1.22 1.79 
1.34 1.66 1.29 1.72 1.23 1.79 
1.38 1.67 1.34 1.72 1.29 1.78 
1.42 1.67 1.38 1.72 1.34 1.77 
1.45 1.68 1.41 1.72 1.38 1.77 
1.48 1.69 1.44 1.73 1.41 1.77 
1.50 1.70 147 1.73 1.44 1.77 
1.52 1.70 1.49 1.74 146 1.77 
1.54 1.71 1.51 1.74 1.49 1.77 
1.56 1.72 1.53 1.74 1.51 1.77 
1.57 1.72 1.55 1.75 1.52 1.77 
1.59 1.73 1.57 1.75 1.54 1.78 
1.60 1.73 1.58 1.75 1.56 1.78 
1.61 1.74 1.59 1.76 1.57 1.78 


Appendix В Tables 1331 


TABLE B.7 = » 
ero | a кылычын ы dicas i Т inia 2, 
Durbin-Watson p—1=1 p-1=2 р-1=3 p-~1=4 p—-1=5 
Test Bounds. = 


n а dy | d, dy d dy а dy а dy 


15 081 1.07 0.70 1:25 059 146 049 1.70 0.39 1.96 
16 0.84 1.09 0.74 1.25 063 1.44 0.53 1.66 0.44 1.90 
17 087 1.10 0.77 1:25 0.67 ^1.43 0.57 163 048 1.85 
18 0.90 1.12 0.80 126 0.71 1.42 0.61 1.60 0.52 1.80 
19 093 1.13 0.83 1.26 0.74 141 0.65 1.58 0.56 1.77 
20. 0.95 115 0.86 1.27 0.77 1.41 0.68 1.57 0.60 1.74 
21 0.97 1.16 0.89 1.27 0.80 1.41 0.72 1.55 0.63 1.71 
22 1.00 1.17 0.91 1.28 0.83 1.40 0.75 1.54 0.66 1.69 
23 1.02 1.19 0.94 1.29 0.86 1.40 0.77 1.53 0.70 1.67 
24 1.04 1.20 0.96 1.30 0.88 1.41 0.80 1.53 0.72 1.66 
25 1.05 1.21 0.98 1.30 090 1.41 0.83 1.52 0.75 1.65 
26 1.07 1.22 1.00 1.31 0.93 1.41 0.85 1.52 0.78 1.64 
27 1.09 1.23 1.02 1.32 0.95 1.41 0.88 1.51 0.81 1.63 
28 1.10 1.24 1.04 1.32 0.97 1.41 0.90 1.51 0.83 1.62 
29 1.12 1:25: 1.05 1.33 0.99 1.42 092 1.51 0.85 1.61 
30 1.13 1.26 1.07 1.34 1:01 1.42 0.94 1.51 0.88 1.61 
31 135 1.27 1.08 1.34 1.02 1.42 0.96 1.51 0.90 1.60 
32 1.16 1.28 1.10 1.35 1.04 1.43 098 1.51 0.92 1.60 
33 117 129 1341 1.36 1.05 1.43 1.00 1:51 0.94 1.59 
34 1.18 1.30 1.13 1.36 1.07 1.43 1.01 1.51 0.95 1.59 
35 149 1.31 1.14 1.37 1:08 1.44 1.03 1.51 0.97 1.59 
36 1.21 1:32 115 1.38 110 1.44 1.04 1.51 0.99 1.59 
37 1.22 1.32 116 1.38 111 1.45 1.06 1.51 1.00 1.59 
38 1.23 1.33 118 1.39 1.12 1.45 1.07 1.52 1.02 1.58 
39 1.24 1.34 1.19 1.39 1.14 1.45 1.09 1.52 1.03 1.58 
40 1.25 „1.34 1.20 1.40 1.15 1.46 1.10 1.52 1.05 1.58 
45 1.29 1.38 1.24 1.42 1.20 1.48 1.16 1.53 1.11 1.58 
50 1.32 1.40 1.28 1.45 1.24 1.49 1.20 1.54 1.16 1.59 
55 1.36 1.43 1.32 1.47 1.28 1.51 1.25 1.55 1.21 1.59 
60 1.38 1.45 1.35 1.48 1.32 1.52 1.28 1.56 1.25 1.60 
65 1.41 1.47 1.38 1.50 1.35 1.53 1.31 1.57 1.28 1.61 
70 1.43 1.49 1.40 1.52 1.37 1.55 1.34 1.58 1.31 1.61 
75 1.45 1.50 1.42 1.53 1.39 1.56 1.37 1.59 1.34 1.62 
80 1.47 1.52 1.44 1.54 1.42 1.57 1.39 1.60 1.36 1.62 
85 1.48 1.53 1.46 1.55 1.43 1.58 1.41 1.60 1.39 1.63 
90 1.50 1.54 1.47 1.56 1.45 1.59 1.43 1.61 1.41 1.64 
95 1.51 1.55 149 1.57 1.47 1.60 145 1.62 1.42 1.64 
100 1.52 1.56 1.50 1.58 148 1.60 1.46 1.63 1.44 1.65 


Source: Reprinted, with permission, from J. Durbin апа С. S. Watson, “Testing for Serial Correlation іп Least Squares Regression. 11,” 
Biometrika 38 (1951), pp. 159-78. 


1332 Appendices 


TABLE B.8 


Table of z' : yi й y j A f z 
Transforma- P c P d di e P ¢ 
tion of .00 .0000 .25 2554 50 5493 75 973 
Correlation 101 0100 .26 .2661 51 5627 ‚76 996 
Coefficient. .02 .0200 .27 2769 52 5763 77 1.020 
.03 .0300 .28 12877 53 5901 78 1.045 
.04 .0400 .29 .2986 .54 .6042 .79 1.071 
.05 .0500 .30 .3095 55 6184 .80 1.099 
.06 :0601 31 .3205 .56 6328 .81 1.127 
.07 .0701 .32 .3316 57 6475 82 1.157 
.08 .0802 .33 .3428 58 .6625 .83 1.188 
.09 .0902 ‚34 3541 59 6777 .84 1.221 
ло 1003. 35 .3654 .60 .6931 .85 1.256 
11 ..1104 .36 . ..3769 .61 .7089 86 1.293 
12 1206 .37. .3884 .62 .7250 87 1.333 
43 .1307 38 4001 .63 ‚7414 .88 1.376 
44 .1409 aj 89 .4118 .64 .7582 .89 1.422 
45 1511 *340 ^ .4236 65. .7753 90 — 1472 
Лб — .1614 Al 4356 .66 .7928 - .91 1.528 
47 4717 :42 .4477 .67 .8107 .92 1.589 
.18 .1820 :43 4599 68 .8291 .93 1.658 
.9 1923 „44 .4722 .69 .8480 94 1.738 
.20 .2027 AS A847 .70 .8673 95 1.832 
21 .2132 A6 4973 71 .8872 96 1.946 
22 2237 47 5101 ‚72 .9076 .97 2.092 
23 .2342 48 .5230 273 .9287 .98 2.298 
24 .2448 49 .5361 .74 9505 99 2.647 


Source: Abridged from Table 14 of Pearson and Hartley, Biometrika Tables for Statisticians, Volume 1, 1966, published by the Cambridge 
University Press, on behalf of The Biometrika Society, by permission of the authors and pubfishers. 


EEEL 


TABLE B.9 Percentiles of the Studentized Range Distribution. 


S 
к 


^ OO ^O л ооо — 
юм 
оо 
л 


3 
13.4 
5:73 
447 
3.98 
3.72 
3.56 


Entry is q(1 —o; r, v) where P(q(r,v) < q(1 —a; г, ue =1—@ 
0 


1-а =. 
E 
4 5 6. 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
164 18.5 20.2 21.5 226 23,6 245 252 259 26.5 27.1 27.6 281 285 29.0 293 297 
6.77 7.54 8.14 8.63 9.05 9.41 9.72 10.0 10,3 10.5 10:7 10.9 11.7 11.2 11.4 11.5 11.7 
5.20 5.74 616 6.51 6.81 7.06 729 7.49 7.67 7.83 7.98 8.12 825 8:37 8.48 8.58 8.68 
4.59 5.03 5.39 5.68 5.93 6.14 .6.33 6.49 6.65 6.78 6:91 7.02 7.13 7.23 7.33 7.41 7.50 
426 4:66 4:98 5.24 5.46 5.65 5.82 5.97 610. 6:22 6.34 6.44 6.54. 6.63 6.71 6,79 6.86 
4.07 444 473 4.97 517 5.34 5.50 5.64 5.76 5.87 5.98 607 6.16 6.32 6.40 :6.47 
3.93 4.28 4:55 478 497 5.14 5.28 541 5,53 564 5.74 5.83 5,91 6.06. 6.13 6.19 
383 417 443 4.65 4.83 4.99 5.13 5.25 536 5.46 5.56 5.64 5,72 5.80 5.87 5.93 6.00 
3.76 4:08 4.34 454 472 487 501 5.13 5.23 5:33 5.42 5.51 5:58. 5.66 5:72, 5:79 5,85 
3.70. 402 4.26 4.447 4.64 4.78 4:91 5.03 5.13 5.23 5.32 540 547 5.61 5:67 5.73 
3.66 3.96 4.20 4.40 4:57 471 4:84 495 5.05 5.15 523 5.31 5.38 5.51 5.57 .5:63 
3.62 3.92 416 435 4.51 4:65 4.78 4:89 .4.99 5:08 516 5:24 .5.31 5.44 5:49 5.55 
3.59 3.88 4.12 430 4.46 4:60 .472 4:83 .4.93 5.02 510° 5.18 .5.25. 5,31 5.37. 5:43 5.48 
| 3.56 3:85 4:08. 4.27 4.42 4:56 468 479 4:88 497 5:05 5.12 5.19 5:26 5,32 5:37 5.43 
3.54 3.83 4.05 423 4:39 4:52 4,64 475 484 4:93 501 5.08 515 5.27 5:32 5:38 
3:52 3.80 4.03 421 436 449 4.61 4:71 481 489 497 5.04 511 517 523 528 5.33 
3:50 3.78 4.00 418 4.33 446 4.58: 4.68 .4.77 4.86 493 5:01 507 5.13 5.19 5.24 5.30 
349 3.77 3.93 4.16 431 444 4.55 465 4.75 483 4.90 4:98 5:04 5.16 5.21 5.26 
347 3:75 3:97 414 4.29 4:42 4.53 4:63 ‚472 4.80 488 4.95 5.01 5.13 5,18 523 
3.46 3.74 3.95 442 4:27 440 А51 4.61 470 478 4:85 4.92 499 540 5.16 .5.20 
3.42 3.60 3.90 4.07 4.21. 434 4.44 4:54 4.63 471 4.78 4:85 4.91 502 5.07 5.12 
3.39 3:65 3.85 4.02 4.16 4,28 4.38 4.47 456 4.64 4:71 4.77 4.83 4.94 4.99 5.03 
3.35 360 3.80 3:96 4Л0 421 4.32 441 4.49 456 4.63 4.69 475 4.86 4:90 4.95 
3.31 3.56 3.75 3.91 4.04 416 425 4.34 4.42 4.49 4.56 4.62 4.67 4.78 4.82 4.86 
3.28 3.52 3.71 3.86 3.99 410 4.19 4.28 4.35 4.42 4.48 4.54 4.60 4.69 4.74 4.78 
3.24 3.48 3.66 3.81 3.93 4.04 4.13 4.21 4.28 4.35 4.41 4.47 4.52 4.61 4.65 4.69 


PEEL 


TABLE B.9 (continued ) Percentiles of the Studentized Range Distribution. 
1 —– о = .95 


2 


3 


4 


5 


180 27.0 328 371 


6.08 
4.50 
3.93 


3.64 
3.46 
3.34 


8.33 
5.91 


6 


7 


40.4 43.1 


11.7 
8.04 
6.71 


6.03 
5.63 
5.36 
5.17 
5.02 


12.4 
8.48 
7.05 


6.33 
5.90 
5.61 
5.40 
5.24 


5.12 
5.03 
4.95 
4.88 
4.83 


4.78 
4.74 
4.70 
4.67 
4.65 


4.62 
4.54 
4.46 
4.39 


4.31 
4.24 
4.17 


8 


9 


10 


45.4 47.4 49.1 


13.0 
8.85 
7.35 


6.58 
6.12 
5.82 
5.60 
5.43 


5.30 
5.20 
5,12 
5.05 
4.99 


4.94 
4.90 
4.86 
4.82 
4.79 


4.77 
4,68 
4.60 
4,52 


4,44 
4,36 
4.29 


13.5 
9.18 
7.60 


6.80 
6.32 
6.00 
5.77 
5.59 


5.46 
5.35 
5.27 
5.19 
5.13 


5.08 
5.03 
4.99 
4.96 
4.92 
4.90 
4.81 
4.72 
‚ 4.63 


4.55 
4.47 
4.39 


14.0 
9.46 
7.83 


6.99 
6.49 


r 
11 


12 


13 


50.6 52.0 53.2 


14.4 
9.72 


14.7 
9.95 
8.21 


7.32 
6.79 
6.43 
6.18 
5.98 


5.83 
5.71 


15.1 
10.2 
8.37 


7.47 
6.92 


14 


15 


54.3 55.4 


15.4 
10.3 


157 
10,5 


16 


17 


56.3 57.2 


15.9 
10.7 


8.52 8.66'«8.79 


7.60 


7.72 


7.032714 


6.66 
6.39 
6.19 


6.03 
5.90 


56.76 
6.48 
6.28 


6.11 
5.98 


7.83 
7.24 


16.1 
10.8 
8.91 


7.93 
7.34 
6.94 
6.65 
6.44 


6.27 
6.13 
6.02 
5.93 
5.85 


5.78 
5.73 
5.67 
5.63 
5,59 


5.55 
5.44 
5.33 
5.22 


5.11 
5.00 
4.89 


18 


19 


20 


58.0 58.8 59.6 


16.4 
11.0 
9.03 


8.03 
7.43 
7.02 
6.73 
6.51 


6.34 
6.20 
6.09 
5.99 
5.91 


5.85 
5.79 
5.73 
5.69 
5.65 


5.61 
5.49 
5.38 
5.27 


5.15 
5.04 
4.93 


16.6 
11.1 
9.13 


8.12 
7.51 
7,10 
6.80 
6.58 


6.40 
6.27 
6.15 
6.05 
5.97 


5.90 
5.84 
5.79 
5.74 
5.70 


5.66 
5.55 
5.43 
5.31 


5.20 
5.09 
4.97 


16.8 
11.2 
9.23 


8.21 
7.59 


SEEL 


TABLE B.9 (concluded) Percentiles of the Studentized Range Distribution. 


i 


10 171 12 313 1 ^ | Ua 18 19 20 
186 237 246 253 | 282 286 290 294. 298 
223 247 39:5 307 31.7 326 334 Ж 5.4 360 37.0 37:5 379 
122 133 142 15:66 162 167 171 175 179 1 3:5 188 19.1 19:3: 195 19.8 
937 | Tl] 1155 11.9 3123 12.6. 12:8 Л 42:3 13:5 13.7 139  14T 142 144 
9.32. 9,67 997 '^— 10€ i 111 1154 116 117 118 19 
32. 861 8:87. ӘЛ 30° (9:49 9:65: 981 9:95 10 0.2. 30:3: 10 10.5 

DEUS Pob: Pis TT . 945 
19.03 
8:57 
8:22 
7.95 
773 
7:55 
7.39 
7.26 
745 
7.05 
16.96 
6.89 
682 
`6.61 
"6:41 
:6:21 
6:02 
21° 5,20 ^ AM ә, : 6 : if 5:83 
5:08 516 5:23 529 535 -$ ‚ 45.49 5.54. 5:61 5.65 


Source: Reprinted, with permission, from Henry Scheffé, The Analysis of Variance (New York: John Wiley & Sons, 1959), рр, 434-36. 


1336 Appendices 


TABLE B.10 Percentiles of H Statistic Distribution. 


Entry is H(1 — o; r, df) where P(H x H(1—o;r, df)}=1-a 


1—-22.95 
r 

df 2 3 4 5 6 7 8 9 10 11 12 

2 39.0 875 142 202 266 333 403 475 550 626 704 

3 154 27.8 39.2 50.7 62.0 72.9 83.5 93.9 104 114 124 

4 9.60 15.5 20.6 252 29.5 33.6 37.5 41.1 44.6 48.0 51.4 

5 7.15 10.8 13.7 16.3 18.7 20.8 22.9 24.7 26.5 28.2 29.9 

6 5.82 8.38 10.4 12.1 13.7 15.0 16.3 17.5 18.6 19.7 20.7 

7 4.99 6.94 8.44 9.70 10.8 11.8 12.7 13.5 14.3 15.1 15.8 

8 4.43 6.00 7.18 8.12 9.03 9.78 10.5 11.1 11.7 12.2 12.7 

9 4.03 5.34 6.31 7.11 7.80 8.41 8.95 9.45 9.91 10.3 10.7 
10 3.72 4.85 5.67 6.34 6.92 7.42 7.87 8.28 8.66 9.01 9.34 
12 3.28 4.16 4.79 5.30 5.72 6.09 6.42 6.72 7.00 7.25 7.48 
15 2.86 3.54 4.01 4.37 4.68 4.95 5.19 5.40 5.59 5.77 5.93 
20 246 2.95 3.29 3.54 3.76 3.94 4.10 4.24 4.37 4.49 4.59 
30 2.07 2.40 2.61 2.78 2.91 3.02 3.12 3.21 3.29 3.36 3.39 
60 1.67 1.85 1.96 2.04 2.11 2.17 2.22 2.26 2.30 2.33 2.36 
оо 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 

1— а = .99 
г 

df 2 3 4 5 6 7 8 9 10 11 12 

2 199 448 729 1,036 1,362 1,705 2,063 2,432 2,813 3,204 3,605 

3 47.5 85 120 151 184 216 249 281 310 337 361 

4 232 37 49 59 69 79 89 97 106 113 120 

5 14.9 22 28 33 38 42 46 50 54 57 60 

6 11.1 15.5 19.1 22 25 27 30 32 34 36 37 

7 8.89 12.1 14.5 16.5 18.4 20 22 23 24 26 27 

8 7.50 9.9 11.7 13.2 14.5 15.8 16.9 17.9 18.9 19.8 21 

9 6.54 8.5 9.9 11.1 12.1 13.1 13.9 14.7 15.3 16.0 16.6 
10 5.85 7.4 8.6 9% 10.4 11.1 11.8 12.4 12.9 13.4 13.9 
12 4.91 6.1 6.9 7.6 8.2 8.7 9.1 9.5 9.9 10.2 10.6 
15 4.07 4.9 5.5 6.0 6.4 6.7 7.1 7.3 7.5 7.8 8.0 
20 3.32 3.8 4.3 4.6 4.9 5.1 5.3 5.5 5.6 5.8 5.9 
30 2.63 3.0 3.3 3.4 3.6 3.7 3.8 3.9 4.0 4.1 4.2 
60 1.96 2.2 2.3 2.4 2.4 2.5 2.5 2.6 2.6 24 2.7 
со 1.00 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 « 1.0 1.0 


Source: Reprinted, with permission, from Н. A. David, "Upper 5 and t% Points of the Maximum F-Ratio,” Biometrika 39 (1952), pp. 422-24. 


ТАВГЕ В.11 
Power Values 
for Analysis of 
Variance (fixed 
effects). 


OONO td шш NO 


te 


AS 
49 
52 
55 
{57 
.58 
.60 
61 
62 
63 
63 
68 
:70 
72 


99 


1.00 
1.00: 
1.00 
1:00: 


1.00 
1.00 
1:00 


1.00 


1.00 
1.00 
1.00 
1.00 
1.00 


1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 


1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1:00 
1.00 
1.00 


ТАВГЕ В.11 
(continued ) 
Power Values 
for Analysis of 
Variance (fixed 
effects). 


© соч С л SUN M 


1 
2 
3 
4 
5 
6 
7 
8 
9 


vy, = 3anda=.05 


04 
.16 
.37 
.59 
.76 
.86 
.91 
95 
.97 
98 
99 
99 
1.00 
1.00 
1.00 
1-00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 


-05 


-205 


47 
.72 
.87 


-94 
-97 
-99 
-99 
1.00 


1.00 
1.00 
1.00 
1.00 
1.00 
1,00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 


.06 
24 
58 
83 
94 
98 
99 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 


‚06 
29 
-67 
-90 
-98 


-99 
1.00 
1.00 
1.00 
1.00 


1.00 
1.00 
1.00 
1.00 
1.00 


1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 


TABLE B.11 
(continued ) 
Power Values 
for Analysis of 
Variance (fixed 
effects). 


0.00 NEA kW m 


Pus 
2 
3 
4. 
% 
6. 
7 
8. 
9 


TABLE B.11 v, = Sand a = .05 
(continued ) 
Power Values 
for Analysis of 
Variance (fixed 


1 08 10  .12 15 18 20 23 26 29 
effects). 2 ET 47 26 35 45 55 64 72 79 
3 44 25 д0 56 Zi 83 91 95 98 

4 147 42322. 53 72 86 94 98 99 1.00 

5 19 39 62 82 93 98 1.00 1.00 1.00 

6 21 л4 70 88 97 99 100 1.00 100 

7 23 48 75 92 98 100 1.00 1.00 100 

8 24 52 «79 94 99 100 1.00 1.00 1.00 

9 26 55 82 96 99 10 100 1.00 1.00 
10 27 57 84 97 100 1.00 100 100 1.00 
12 29 61 88 98 100 1.00 1.00 1.00 тоо 
14 30 64 .90 99 100 100 100 100 100 
16 32  .66 9 99 100 1.00 100 100 1.00 
18 33 68 92 99 100 1.00 1.00 1.00 1.00 
20 34 70 93 99 100 1.00 1.00 1.00 1.00 
22 34 т 94 99 100 1.00 1.00 1.00 1.00 
24 35 72 94 100 100 100 100 100 1.00 
26 36 - .3 95 100 100 100 1.00 100 1.00 
28 36 73  .95 100 100 1.00 1.00 1.00 1.00 
30 36 74 95 100 100 1.00 1.00 1.00 1.00 
60 40 78  .97 100 100 100 1.00 1.0 1.00 
120 A1 80 .97 1.00 100 1.00 100 1.00 1.00 
оо 43 82  .98 1.00 100 100 100 1.00 1.00 


vı = 5 and o = .01 


1 .02 .02 .02 .03 .04 .04 .05 .05 .06 
2 .02 .04 06 08 11 15 18 22 27 
3 .03 .06 41 18 26 .35 45 55 64 
4 .04 .09 18 .30 44 59 72 82 90 
5 05 12 25 42 61 77 .88 .95 98 
6 06 45 32 53 73 .88 .95 99 1.00 
7 06 18 39 63 82 93 98 100 1.00 
8 .07 21 44 70 .88 .97 99 1.00 1.00 
9 08 24 49 75 92 98 100 1,00 1.00 
10 09 26 54 .80 94 99 100 1.00 1.00 
12 10 .30 61 86 97 100 1.00 1.00 1.00 
14 41 34 66 90 98 1.00 1.00 1.00 1.00 
16 12 36 70 .92 99 1.00 1.00 1.00 1.00 
18 42 .39 73 94 99 1.00 100 1.00 1.00 
20 43 Al 76 95 99 100 1.00 1.00 1.00 
22 44 43 78 96 100 1.00 100 1.00 1.00 
24 14 44 «79 96 100 1.00 1.00 1.00 1.00 
26 14 45 80 97 100 100 1.00 1.00 1.00 
28 45 46 .82 97 100 1.00 1.00 1.00 1.00 
30 15 AZ 82 97 100 1.00 1.00 1.00 1.00 
60 18 55 88 99 100 100 1.00 1.00 1.00 
120 .20 .59 91 99 100 100 1.00 1.00 1.00 


оо 21 -62 .93 1.00 1.00 1.00 1.00 1.00 1.00 


TABLE В.11 Б 7 v Zada 05. з 
(concluded ) : а ET йан PR 
Power Values 
for Analysis of 
Variance (fixed 
effects). 


07 40 492 | 45^; A7 .20 23 25 28 
40 Л 25 34 44 54. 63 71 78 
44 25 40 56 JA ‚82 90 95 98 
46 32 53 72 86 94 98 99 1.00 
| 63 B2 9. E 1.00 1.00 1.00 
31 44 270 .89. 497 .99 1.00 1.00 1.00 

23 49  J6 93 98 1.00 -1.00 1.00 1.00 

25 .53. 80 9 99 1,00 Тоо 1.00 1.00 

.26 56 83 96 1.00 1.00 1.00 1.00. 1.00 

10 28 59 86 97 1,00 1:00 1.00 1.00 1.00 
12 30 63. 89 98 1.00 1.00 1:00 1.00 1.00 
14 .32 .66 91. 99 1.00 1.00 1.00 1.00 1.00 
16 33 69 :93 99 14.00 1.00 1:00 1.00 1.00 
18 34 . 71 :94 99 1.00 1:00 1.00 1.00 1.00 
: ) 4.00. 1:00 1:00 ‚1.00 1.00 

1.00. 1:00 1.00 1.00 1.00 
1:00 лоо 1.00 1.00 1.00 
К) 96 1.00 1.00 1.00 1.00 1.00 1.00 
в | зз 77 96 ло 1:00 100 1:00 100 1.00 
30 > 39 77 97 3.00 1:00 OC 1:00 1.00 1.00 
-60 ^ 42 82 :98 ^ 100 1.00 :00 1.00 100 1.00 
120 ^| 45 84 .99 100 100 100 1.00 1.00 1.00 
со 47 :86 99° 100 то0о 1.00 1.00 1.00 1.00 


v 0 MO лз NO 
— 
№ 
^о 
9 
© 
S 
© 
A 
o 
© 


Moz бапа œ= 01. 


05: .05 .06 
18 22 .26 
E 55 64 
72 82 90 
88: 95 98 
96. 99 1.00 
98. 1.00 1.00 
99 1.00° 1.00 
1,00. 1.00. 1.00 
100 1.00 1.00 
1.00 1.00 1.00 
1.00 1.00 1.00 
1.00 1.00 1.00 
1:00 1.00 1.00 
1.00 1.00 1.00 
1:00 1.00 . 1.00 
1.00 1.00 1:00 
1:00 1.00 1.00 
100 1.00. 1.00 
1:00 1.00 1.00 
1:00 1.00 1.00 
2, ; :00: ;00. *1:00 1.00 1.00 
оо .24 .69 .96 1.00 1.00 1:00 1.00 1.00 1.00 


CHEL 


TABLE B.12 Table for Determining Sample Size for Analysis of Variance (fixed factor levels model). 


-— 


-— 


о юю мо л Бом x 


о ооо мо ол һом + 


A/o = 1.0 


А /с = 1.25 


.01 


15 
17 
19 
20 
21 
22 
23 
24 
25 


А /с = 1.25 


Power 1 — 8 = .70 


А /о = 2.5 


А /с = 3.0 


A/o = 1.50 A/o = 1.75 
a a 
2 .1 05 01 .2 .1 .05 .01 
4 6 7 11 3 4 6 9 
5 7 8 12 4 5 7 10 
5 7 9 13 4 6 7 10 
6 8 10 14 5 6 8 т 
6 9 11 15 5 7 8 12 
7 9 11 16 5 7 9 12 
7 10 12 17 6 7 9 13 
7 10 12 17 6 8 9 13 
8 10 13 18 6 8 10 14 


Power 1 — В 2.80 


щл (л (л о RR UU) jy 


чыч ысы ON ON CA 


UJ оз о) U2 ою UJ IO NJ ND. N 


Geo oA d дм оошо © 


ON OV Cà (л (л (дл (л (д (л 


A/o = 1.50 А/о = 1.75 
a a 
2 л 05 01 2 л .05 .01 
5 7 9 13 4 5 7 10 
6 8 10 145 6 8 M 
7 911 16 5 7 9 12 
8 10 12 17 6 8 9 13 
8 11 i3 18 6 8 10 13 
9 11 14 18 7 9 10 14 
9 12 14 19 7 9 11 15 
9 12 15 20 7 9 11 15 
10 13 15 21 8 10 12 16 


OON ONU щл щл. RU К, 


A/c = 2.0 
Q 
.4 .05 .01 
4 5 7 
4 5 8, 
5 6 8 
5 6 9 
5 7 9 
6 7 10 
6 7 10 
6 8 10 
6 8 11 
А /с = 2.0 
a 
Л .05 .01 
4 6 8 
5 6 9 
6 7 10 
6 7 10 
7 8 11 
7 8 11 
7 9 12 
7 9 3142 
8 9 12 


AOBROoROA Ad 0)JU)U) jy 


(л (л (л (л (л dS dU ш 


Ф су Ф оу Ф хл ллһ € 


Со Со Со со CO м м м OS 


UJ UJ оз UJ UJ о) оо UO № N 


оь RR A WWW ы 


UU UC S S S S A C 


оо о бу о о л (л (л 


TABLE В.12 болса |1 Table for чепи Sample S Size for Analysis of Variance (fixed factor levels model). 


ў “Power 1 wie = о. 


А/ўт=20  А/ф=25 W[oc30 
Q a 


ol 
(л |. 
о 
Га 


= 01 : 


СА Олл Са 0л Ж 5 de) Ку 
ceo ласы. Оуу AH 9 


MOO ооу UU л А 7 
Жы ый S аә Ку]. 


— 


Goose “о оо оо ч мох 


DNNN бу Ot A Юу] 


ХА СА wih RR GRE o ш] 
OR WW UU nous f 
чысы SDN SE о о 


© © X9 0^9 0 0 м 


F bI 
2 1 
3 1 
4 

52 
6. ` 
7 
9 
0; ne 


20. 
8:21: 

22. 
22. 


ED EO Abe Row. i 
© Ui US їз Ur RR. 


бо борч NWO VA. LS. 
ч ау охо сул алал a | 
со. со оо O0 со Ч м ч OS © 


© \о оо У OU A оо ч 
OSS Блоб IR d" 
Хо оо co co «iS s. |] 


-— 


- 
E Source: Reprinted, with permission, from T. L. Bratcher, M. A. Moran, and W. J. Zimmer, “Tables of Sample Sizes in the Analysis of Variance,” Journal of Quality Technology 2 (1970), pp. 156-64. Copyright 
W American Society for Quality Control, Inc. 


1344 Appendices 


TABLE ВИЗ Probability of Correct Identification 
Table of (1— o) 
„тс for Number of —— " — —— — 
Determining Populations (г) .90 .95 .99 
Sample Size to 2 1.8124 2.3262 3.2900 
Find “Best” of 3 2.2302 2.7101 3.6173 
r Population 4 2.4516 2.9162 3.7970 
Means. 5 2.5997 3.0552 3.9196 
6 2.7100 3.1591 4.0121 
7 2.7972 3.2417 4.0861 
8 2.8691 3.3099 4.1475 
9 2.9301 3.3679 4.1999 
10 2.9829 3.4182 4.2456 
Source: Reprinted, with permission, from R. E. Bechhofer, "A Single-Sample Multiple Decision 
Procedure for Ranking Means of Normal Populations with Known Variances,” The Annals of 
Mathematical Statistics 25 (1954), pp. 16-39. 
TABLE B.14 
Selected 3x3 4x4 
Standard Latin 1 2 3 
Squares. ABC A В С р A В С р ABC 
BC A B ADC B CDA B DA 
C AB C DBA C DAB C AD 
D C AB D ABC D C B 
5x5 6x6 
ABCDE A BC DE F A B 
В AE CD BF DC AE B C 
C D AE B C DE F B A C D 
DE B A C D AF EC B DE 
EC DBA E С A BF D E Е 
F E B А р С F С 
ы С А 
8х8 9x9 
ABC DE F GH A BCDEF 
В С DEFGHRHA В С DE Е С 
C DE F GH A В C DE Е GH 
DE ЕЁ С Н A B C DE F С Н 1 
EF GH A В С р E FGHI A 
FG HA BC DE F GHI A B 
СН А BC DE F СНІ A B C 
Н А В С Рр Е FG HI ABCD 
I ABCDE 


oov? 
бољ о 
оъ об 
Log 


ч 


Brannon 
Awranmy X 
оО ољ Ст гт 
то б оь Сус 
тог Су СУ оъ СУ 


тот рс шо» TO. 

сусты сусу» Б 

Сот гт ру б ољ 
» 


Appendix В Tables 1345 


TABLE B.15 Selected Balanced Incomplete Block Designs. 


Design 1: r —4, rp— 2, Design 2: r =4, 163, ‘Design.3: r=5, ь=2, Е „Design 4: г = 5,76 —3; 


np = 6, п=3, пр=1 : np — 4, n3, np zz np —190, п=4, ng—1. - - ng 10, n—6;ny—3: 
Block Treatments Block Treatments. ‘Block “Treatments Block Treatments 
1 1 2 1 1 2 3 1 1 2 1 12 3 
3 4 2 124 2: 3. 4 2 Ti 2:25 4S: 

3 1 3 3 13 4 3 2.5 3 14 5 
4 2 4 4 2 3 4 4 13 4 234 
5 1 4 5 4 5 5 3.4 5 
6 2 3 6 T 4 6 124 
7 2 3 7 13 4 

8 3 5 8 13 5 

9 15 9 235 

10 2 4 10 2 4 5 


Design 5:r—5,rp—4, Design 6: r —6, r5 —2, Design 7:r =6;fp=3, ^ Design8: r—6, r,—3, 
Пр== 5, N=4, ny =3 np = 15, nz5, пр «1 np 10, n—5, fi =2. ny — 20; п=10, пр=4 


‘Block Treatments Block Treatments Block ‘Treatments. Block Treatments 


T 1234 1 12 1 12 5 1 123 
2 123 5 2 3 4 2 126 4 5 6 
3 124 5 3 5 6 3 134 3 124 
4 13.45 4 1 4 4 1 3.6 4 3 5 6 
5 2345 5‹ 2 5 5 145 5 125 
6 4. 6 6 2.3 4. 6 3 4 6 

7 1 4 7 2 3. 5 7 126 

8 2 6 8 2 4 6: 8 3 4.5 

9. 3 5. 9 3 5 6 9 13 4 

10 15 10 4 5. 6 10 2 5 6 

11 2 4 11 135 

12 3 6 12 246 

13 1 6 13 13 6 

14 2 3 14 2 4 5 

15 4 5 15 145 

16 2 3 6 

17 146 

18 235 

19 15 6 

20 23 4 


1346 Appendices 


TABLE B.15 (continued ) Selected Balanced Incomplete Block Designs. 


Design 9: r — 6, rp — 4, Design 10: r = 6, rp = 5, Design 11: r = 7, rp = 2, 
np = 15, n= 10, пр = 6 np = 6, п = 5, пр = 4 пь= 21, п = 6, пр = 1 
Block Treatments Block Treatments Block Treatments 
1 123 4 1 123 4 5 1 1 2 
2 14 56 2 123 4 6 2 6 
3 23 5 6 3 123 5 6 3 з 4 
4 1235 4 124 56 4 4 7 
5 1246 5 13 4 5 6 5 1.5 
6 3 4 5 6 6 234 56 6 5 6 
7 1236 7 3 7 
8 134 5 8 1 3 
9 2 4 56 9 2 4 
10 1 2 4 5 10 3 5 
11 13 5 6 11 4 6 
12 23 4 6 12 5 7 
13 12 5 6 13 1 6 
14 13 4 6 14 2 7 
15 23 4 5 15 1 4 
16 2 3 
17 3 6 
18 4 5 
19 2 5 
20 6 7 
21 1 7 
Design 12: r = 7, rp = 3, Design 13: r = 7, rp = 4, Design 14: r = 7, rp = 6, 
np —7,n-c3,ny = 1 np = 7, П = 4, пр = 2 П = 7, П = 6. Пр = 5 
Block Treatments Block Treatments Block Treatments 
1 124 1 3 5 6 7 1 123 4 5 
3 5 2 14 6 7 2 1234 5 
3 з 4 6 3 12 57 3 12346 
4 4 5 7 4 1236 4 12356 
5 5 6 1 5 2 3 4 7 5 12 4 5 6 
6 6 7 2 6 134 5 6 13 4 5 6 
7 7 1 3 7 2 4 56 7 23 4 5 6 


ччччч чо 


Appendix В Tables 1347 


TABLE B.15 (continued) Selected Balanced Incomplete Block Designs. 


Design 15:r—8, rp —2, Design 16: r =8, гь = 4,. Design 1 Z: r= 8, tb = 7, 
np = 28,n=7, Пр = 1 Mm =14,n= Z, Np =3 = 8, = 7, ny =6 
Block’ — Treatments Block Treatments Block Treatments | 

3 1 5 


“= 
— 


WWNHNNNNN 
ал л Ur Uo dS dS d 
муч ммо © 


Noe зш ш ызы 
Bab Bow wow wiw 
CO OO O0 OU 

бо `оо. бо оо. со 00/00 N. 


со M Ov Ui Scu) № 


Design 18: г = 9775 —3; 
пь= 12, п= 4, ny =1 

Block Treatments - 

1 1 


Mom Nw а № № а а л 
VU) o UU мло зш 4 IN ом 
N00 бо M00 CV бо N N бо о.б 00 e 


OV су олу (л л Оу лл Оу л м м 


Су “л бо COUPON NO O00 М WU 00 ON AN (л °бо w сс CN. 
со Ui Qu I с щл оли SIND 
N00 AO М ON 00 0/00 б Ne 


= 
Un 
PWN — BUNC uw NO NÀOU UN ч озю ON WwW 


Appendix 


Data Sets 


Data Set C.1 SENIC 


The primary objective of the Study on the Efficacy of Nosocomial Infection Control (SENIC 
Project) was to determine whether infection surveillance and control programs have reduced 
the rates of nosocomial (hospital-acquired) infection in United States hospitals. This data 
set consists of a random sample of 113 hospitals selected from the original 338 hospitals 
surveyed. 

Each line ofthe data set has an identification number and provides information on 11 other 
variables for a single hospital. The data presented here are for the 1975—76 study period. 
The 12 variables are: 


Variable 
Number Variable Name Description 
1 Identification number 1-113 | 
2 Length of stay Average length of stay of all patients in hospital (in days) 
3 Age Average age of patients (in years) 
4 Infection risk Average estimated probability of acquiring infection in hospital 
(in percent) 
5 Routine culturing Ratio of number of cultures performed to number of patients without 
ratio signs or symptoms of hospital-acquired infection, times 100 
6 Routine chest X-ray Ratio of number of X-rays performed to number of patients 
ratio * without signs or symptoms of pneumonia, times 100 
7 Number of beds Average number of beds in hospital during study period 
8 Medical school affiliation 1 = Yes, 2=No 
9 Region Geographic region, where: 1 = NE, 2 —- NC, 3 = 5, 4 -W 
10 Average daily census Average number of patients in hospital per day during study period 
11 Number of nurses Average number of full-time equivalent registered and licensed 
practical nurses during study period (number full time plus 
one half the number part time) 
12 Available facilities Percent of 35 potential facilities and services that are provided 


and services by the hospital 


Reference: Special Issue, “The SENIC Project,” American Journal of Epidemiology 111 (1980), pp. 465—653. Data obtained from Robert W. Haley, M.D., Hospital 
Infections Program, Center for Infectious Diseases, Centers for Disease Control, Atlanta, Georgia 30333. 


1348 


Appendix C Data Sets 
1 2 4 5 6 7 8 9 10 11 
1 7.13 557 4.1 90 396 279 2 4 207 241 
2 8.82 58.2 1.6 3.8 51.7 80 2 2 51 52 
3 8.34 569 2.7 8.1 74.0 107 2 3 82 54 
111 7.70 569 44 122 679 129 2 4 85 136 
112 1794 562 59 264 91.8 835 1 1 791 407 
113 9.41 595 31 206 91.7 29 2 3 20 22 


Data Set C.2 CDI 


This data set provides selected county demographic information (CDI) for 440 of the most 
populous counties in the United States. Each line of the data set has an identification number 
with a county name and state abbreviation and provides information on 14 variables for a 
single county. Counties with missing data were deleted from the data set. The information 
generally pertains to the years 1990 and 1992. The 17 variables are: 


Variable 
Number 


Ov CH Љум — 


1349 


Variable Name 


Description 


Identification number 

County 

State 

Land area 

Total population 

Percent of population 
aged 18-34 

Percent of population 
65 or older 

Number of active 
physicians 

Number of hospital beds 

Total serious crimes 


Percent high school 
graduates 

Percent bachelor's 
degrees 

Percent below 
poverty level 

Percent unemployment 

Per capita income 

Total personal income 

Geographic region 


H 


1-440 

County name 

Two-letter state abbreviation 

Land area (square miles) 

Estimated 1990 population 

Percent of 1990 CDI population 
aged 18-34 

Percent of 1990 CDI population 
aged 65 years old or older 

Number of professionally active nonfederal 
physicians during 1990 

Total number of beds, cribs, and bassinets during 1990 

Total number of serious crimes in 1990, including murder, rape, 
robbery, aggravated assault, burglary, larceny-theft, and 
motor vehicle theft, as reported by law enforcement agencies 

Percent of adult population (persons 25 years old or older) 
who completed 12 or more years of school 

Percent of adult population (persons 25 years old or older) 
with bachelor's degree 

Percent of 1990 CDI population with income below 
poverty level . 

Percent of 1990 CDI labor force that is unemployed 

Per capita income of 1990 CDI population (dollars) 


Total personal income of 1990 CDI population (in millions of dollars) 


Geographic region classification is that used by the U.S. Bureau 
of the Census, where: 1 = NE, 2 = NC, 3 = 5, 4 =W 


Source: Geospatial and Statistical Data Center, University of Virginia. 


1350 Appendices 


1 2 3 4 5 6 7 8 9 10 
1 Los-Angeles CA 4060 8863164 321 9.7 23677 27700 688936 
2 Cook IL 946 5105067 29.2 12.4 15153 21550 436936 
3 Harris TX 1729 2818199 313 7.1 7553 12449 253526 
438 Montgomery TN 539 100498 357 79 87 188 6537 
439 Maui HI 1159 100374 262 11.3 192 182 7130 
440 Morgan AL 582 100043 26.3 11.7 122 464 4693 
1i 12 13 14 15 16 17 
70.0 223 11.6 8.0 20786 184230 4 
73.4 228 111 72 21729 110928 2 
74.9 254 12.5 5.7 19517 55003 3 
77.9 165 10.8 8.0 13169 1323 3 
77.0 178 5.7 3.2 18504 1857 4 
69.4 15.5 94 71 16458 1647 3 


Data Set С.З Market Share 


Company executives from a large packaged foods manufacturer wished to determine which 
factors influence the market share of one of its products. Data were collected from a national 
database (Nielsen) for 36 consecutive months. Each line of the data set has an identification 
number and provides information on 6 other variables for each month. The data presented 
here are for September, 1999, through August, 2002. The variables are: 


Variable 


Number Variable Name 


Market share 
Price 
Gross Nielsen 


AUN 


Identification number 


rating points 


5 Discount price 


6 Package promotion 


7 Month 
8 Year 


3 


1 3.15 
2 2.52 
3 2.64 
34 2.80 
35 2.48 
36 2.85 


2.198 
2.186 
2.293 
2.518 
2.497 
2.781 


Description 


1-36 

Average monthly market share for product (percent) 

Average monthly price of product (dollars) 

An index of the amount of advertising exposure that 
the product received 

Presence or absence of discount price during 
period: 1 if discount, 0 otherwise 

Presence or absence of package promotion during 
period: 1 if promotion present, 0 otherwise 

Month (Jan-Dec) 

Year (1999-2002) 


4 5 6 7 a 8 
498 1 1 Sep 1999 
510 0 0 Oct 1999 
422 1 1 Nov 1999 
270 1 0 Jun 2002 
322 0 1 Jul 2002 
317 1 1 Aug 2002 


Appendix C DataSets 1351 


Data Set C.4 University Admissions 


The director of admissions at a state university wanted to determine how accurately students’ 
grade-point averages at the end of their freshman year could be predicted by entrance test 
Scores and high school class rank. The academic years cover 1996 through 2000. Each line 
of the data set has an identification number and information on 4 other variables for each 
student. The 5 variables are: 


Variable 
Number Variable Name Description 
1 Identification number 1-705 
2 GPA Grade-point average following freshman year 
3 High school class rank High school class rank as percentile: lower 
percentiles imply higher class ranks 
4 ACT score ACT entrance examination score 
5 Academic year Calendar year that freshman entered university 
1 2 3 4 5 


1 0.980 61 20 1996 
2 1.130 84 20 1996 
3 1.250 74 19 1996 
703 4.000 97 29 2000 
704 4.000 97 29 2000 
705 4.000 99 32 2000 


Data Set C.5 Prostate Cancer 


Variable 
Number 


с л. оо NN 


о со м 


A university medical center urology group was interested in the association between 
prostate-specific antigen (PSA) and a number of prognostic clinical measurements in men 
with advanced prostate cancer. Data were collected on 97 men who were about to undergo 
radical prostectomies. Each line of the data set has an identification number and provides 


information on 8 other variables for each person. The 9 variables are: 


Variable Name 


Identification number 

PSA level 

Cancer volume 

Weight 

Age 

Benign prostatic 
hyperplasia 

Seminal vesicle invasion 

Capsular penetration 

Gleason score 


Description 


1-97 

Serum prostate-specific antigen level (mg/ml) 
Estimate of prostate cancer volume (cc) 
Prostate weight (gm) 

Age of patient (years) 

Amount of benign prostatic hyperplasia (cm?) 


Presence or absence of seminal vesicle invasion: 1 if yes; 0 otherwise 
Degree of capsular penetration (cm) 
Pathologically determined grade of disease using total score of two 
patterns (summed scores were either 6, 7, or 8 with higher 
scores indicating worse prognosis) 


1352 Appendices 


1 2 3 4 5 6 7 8 

1 0.651 0.5599 15.959 50 0 0 0 

2 0.852 0.3716 27.660 58 0 0 0 

3 0.852 0.6005 14.732 74 0 0 0 
95 170.716 18.3568 29.964 52 0 1 11.7048 
96 239.847 17.8143 43.380 68 4.7588 1 4.7588 
97 265.072 32.1367 52.985 68 1.5527 1 18.1741 


ч чо № 


со со 00: 


Adapted in part from: Hastie, T. J.; R. J. Tibshirani; and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and 
Prediction. New York: Springer-Verlag, 2001. 


Data Set C.6 Website Developer 


Management of a company that develops websites was interested in determining which 
variables have the greatest impact on the number of websites developed and delivered to 
customers per quarter. Data were collected on website production output for 13 three-person 
website development teams, from January 2001 through August 2002. Each line of the data 
set has an identification number and provides information on 6 other variables for thirteen 


teams over time. The 8 variables are: 


Variable 
Number Variable Name Description 
1 Identification number 1-73 
2 Websites delivered Number of websites completed and delivered to customers 
during the quarter 
3 Backlog of orders Number of website orders in backlog at the close of the quarter 
4 Team number 1-13 
5 Team experience Number of months team has been together 
6 Process change A change in the website development process occurred during the 
second quarter of 2002: 1 if quarter 2 or 3, 2002; 0 otherwise 
7 Year Б 2001 or 2002 
8 Quarter 1, 2, 3, or 4 
1 2 3 4 5 6 7 8 
1 1 12 1 3 0 2001 1 
2 2 18 1 6 0 2001 2 
3 7 26 1 9 0 2001 3 
71 7 36 13 14 0 2002 1 
72 19 37 13 17 1 2002 2 
73 12 26 13 20 1 2002 3 


Appendix C DataSets 1353 


Data Set C.7 Real Estate Sales 


The city tax assessor was interested in predicting residential home sales prices in a mid- 
western city as a function of various characteristics of the home and surrounding property. 
Data on 522 arms-length transactions were obtained for home sales during the year 2002. 
Each line of the data set has an identification number and provides information on 12 other 
variables. The 13 variables are: 


Variable 
Number 


CO oco M0 Uc dS UNO 


РЕ; 


11 
12 
13 


Variable Name 
Identification number 
Sales price 
Finished square feet 
Number of bedrooms 
Number of bathrooms 
Air conditioning 
Garage size 
Pool 
Year built 
Quality 
Style 
Lot size 
Adjacent to highway 
2 3 4 
360000 3032 4 
340000 2058 4 
250000 1780 4 
133500 1922 3 
124000 1480 3 
95500 1184 2 


: ою [0л 


N = 


- 


Description 


1-522 
Sales price of residence (dollars) 
Finished area of residence (square feet) 
Total number of bedrooms in residence 
Total number of bathrooms in residence 
Presence or absence of air conditioning: 1 if yes; 0 otherwise 
Number of cars that garage will hold 
Presence or absence of swimming pool: 1 if yes; О otherwise 
Year property was originally constructed 
Index for quality of construction: 
1 indicates high quality; 
2 indicates medium quality; 
3 indicates low quality 
Qualitative indicator of architectural style 
Lot size (square feet) 


Presence or absence of adjacency to highway: 1 if yes; 0 otherwise 


6 7 8 9 10 11 12 

1 2 0 1972 2 1 22221 
1 2 0 1976 2 1 22912 
1 2 0 1980 2 1 21345 
0 2 0 1950 3 1 14805 
1 0 1953 3 1 28351 
0 1 0 1951 3 1 14786 


= 
Ww 


ооо 


ooo 


Data Set C.8 Heating Equipment 


A manufacturer of heating equipment was interested in forecasting the volume of monthly 
orders as a function of various economic indicators, supply-chain factors, and weather in a 
particular sales region. Data by month over a four-year period (1999—2002) for this region 
were available for analysis. Each line of the data set has an identification number and 
provides information on 9 other variables. The 10 variables are: 


1354 Appendices 


Variable 
Number Variable Name Description 
1 Identification number 1-43 
2 Number of orders Number of heating equipment orders during month 
3 Interest rate Prime rate in effect during month 
4 New homes Number of new homes completed and for sale in sales region 
during month 
5 Discount Percent discount (0—5) offered to distributors during month; value is 
usually 0, indicating no discount 
6 Inventories Distributor inventories in warehouses during month 
7 Sell through Number of units sold by distributor to contractors in previous month 
8 Temperature deviation Difference between average temperature for month and 30-year 
average for that month 
9 Year 1999, 2000, 2001, or 2002 
10 Month Coded 1-12 
1 2 3 4 5 6 7 8 9 10 
1 121 0.0750 64 0 3536 615 2.22 1999 1 
2 227 0.0750 64 0 3042 813 0.28 1999 2 
3 446 0.0750 65 0 2456 704 0.79 1999 3 
41 754 0.0475 64 0 1417 927 0.81 2002 6 
42 1098 0.0475 65 0 1244 877 0.28 2002 7 
43 1158 0.0475 65 0 1465 809 0.50 2002 8 


Data Set C.9 Ischemic Heart Disease 


A health insurance company collected information on 788 of its subscribers who had made 
claims resulting from ischemic (coronary) heart disease. Data were obtained on total costs 
of services provided for these 788 subscribers and the nature of the various services for 
the period of January 1, 1998 through December 31, 1999. Fach line in the data set has an 
identification number and provides information on 9 other variables for each subscriber. 
The 10 variables are: 


^ 


Variable 


Number Variable Name 


- 


© \© бо моол һом 


Identification number 
Total cost 

Age 

Gender 

Interventions 

Drugs 

Emergency room visits 
Complications 
Comorbidities 
Duration 


Description 


1-788 

Total cost of claims by subscriber (dollars) 

Age of subscriber (years) 

Gender of subscriber: 1 if male; 0 otherwise 

Total number of interventions or procedures carried out 

Number of tracked drugs prescribed 

Number of emergency room visits 

Number of other complications that arose during heart disease treatment 
Number of other diseases that the subscriber had during period 

Number of days of duration of treatment condition 


Appendix C Data Sets 1355 


1 2 3 4 
1 179.1 63 0 
2 3190 59 0 
3 93107 62 0 
786 26777 68 0 
787 1282.2 58 0 
788 586.0 56 0 
Data Set C.10 Disease Outbreak 


оо- |а 


ANN 


10 


моъ |м 


ANA 


| ooo|o 


ooo 


лом |0 


à 
UNO 


300 
120 
353 
303 
244 
336 


This data set provides information from a study based on 196 persons selected in a probabil- 
ity sample within two sectors in a city. Each line of the data set has an identification number 


and provides information on 5 other variables for a single person. The 6 variables are: 


Variable 

Number Variable Name 
1 Identification number 
2 Age 
3 Socioeconomic status 
4 Sector 
5 Disease status 
6 


Savings account status 


Description 


1-196 
Age of person (in years) 


1 — upper, 2 — middle, 3 — lower 
Sector within city, where: 1 — sector 1, 


2 — sector 2 


1 = with disease, 0 = without disease 


1 = has savings account, 0 = does not have 
savings account 


Adapted in part from H. G. Dantes, J. S. Koopman, C. L. Addy, et al., “Dengue Epidemics on the Pacific Coast of Mexico,” International 


Journal of Epidemiology 17 (1988), pp. 178-86. 


1 2 3 4 

1 33 1 1 

2 35 1 1 

3 6 1 1 
194 31 3 1 
195 85 3 1 
196 24 2 1 


Data Set C.11 IPO 


ооо: ооо |м 


ore о 


о-о: 


Private companies often go public by issuing shares of stock referred to as initial public 
offerings (IPOs). A study of 482 IPOs was conducted to determine what are the character- 
istics of companies that attract venture capital funding. The response of interest is whether 
or not a company was financed with venture capital funds. Potential predictors include the 
face value of the company, the number of shares offered, and whether or not the company 


1356 Appendices 


underwent a leveraged buyout. Each line of the data set has an identification number and 
provides information on 4 other variables for a single person. The 5 variables are: 


Variable 
Number Variable Name Description 
1 Identification number 1-482 
2 Venture capital funding Presence or absence of venture capital funding: 
1 if yes; 0 otherwise 
3 Face value of company Estimated face value of company from prospectus 
(in dollars) 
4 Number of shares offered Total number of shares offered 
5 Leveraged buyout Presence or absence of leveraged buyout: 


1 if yes; 0 otherwise 


1 2 3 4 5 
1 0 1,200,000 3,000,000 0 
2 0 1,454,000 1,454,000 1 
3 0 1,500,000 300,000 0 
480 0 159,500,000 7,250,000 0 
481 0 165,000,000 11,000,000 0 
482 0 234,600,000 9,200,000 0 


Data Set C.12 Drug Effect Experiment 


This data set provides results adapted from an experiment in which the effects of a drug on 
the behavior of rats were studied. The behavior under consideration was the rate at which 
a rat deprived of water presses a lever to obtain water. The experiment was carried out in 
two parts. Variable 2 identifies the two parts of the study (1, 2). 

In Part I of the study, 12 male albino rats of the same strain and approximately the same 
weight were utilized. Variable 3 identifies each rat (1, ..., 12). Prior to the experiment, 
each rat was trained to press a lever for water until a stable rate of pressing was reached. 
Two factors were studied in this experiment—initial lever press rate (factor A) and dosage 
of the drug (factor B). The 12 rats were classified into one of three groups according to their 
initial lever press rate. Variable 4 identifies the level of the initial lever press rate (1, 2, 3). 
Level 1 is a slow rate, level 2 a moderate rate, and level 3 a fast rate. The levels were defined 
such that one third of the rats were classified into each of the three levels. 

Four dosage levels of the drug were studied, including a zero level consisting of a saline 
solution. Variable 5 identifies the drug dosage (1, ..., 4). All dosage levels were specified 
in terms of milligrams of drug per kilogram of weight of the rat. 

One hour after a drug dosage injection was administered, an experimental session began 
during which the rat received water each time after the second lever press. This reinforce- 
ment schedule will be denoted by FR-2. Each rat received all four drug dosage levels in a 
random order. Each of the four drug dosages was administered twice, thus providing two 
Observation units for each treatment. Variable 6 identifies the observation unit (1, 2). 


Appendix C Data Sets 1357 


The response variable was defined as the total number of lever presses divided by the 
elapsed time (in seconds) during a session for the given treatment. Variable 7 is the response 
variable. 

In Part II of the study, another 12 albino male rats of the same strain and approximately 
the same weight as the rats used in Part I were used. Variable 2 identifies this part of the 
study, and variable 3 identifies the 12 additional rats (13, . . . , 24). The experimental design 
for Part П of the study was exactly the same as for Part I, except that each rat received water 
each time after the fifth lever press. This reinforcement schedule will be denoted by FR-5. 
Variable 2 identifies the reinforcement schedule since Part I of the study used schedule FR-2 
while Part II of the study used schedule FR-5. The reinforcement schedule thus is another 
factor (factor C) that was studied in the combined experiment. 

To summarize, the variables for this experimental design are: 


Variable 
Number Variable Name Description 
1 Identification number 1-192 
2 Part of study 1:Part | (FR-2) 
(factor C: reinforcement schedule) 2:Part Il (FR-5) 
3 Rat identification 1-24 
4 Initial lever press rate 1:Slow 
(factor A) 2: Moderate 
3:Fast 
5 Dosage level (mg/kg) 1:0 (saline solution) 
(factor B) 2:.5 
3:1.0 
4:1.8 
6 Observation unit 1,2 
7 Response variable—lever Total number of lever 
press rate presses divided by elapsed 


time in seconds 


Reference: T. С. Heffner; R. B. Drawbaugh; and M. J. Zigmond. "Amphetamine and Operant Behavior in Rats: Relationship 
between Drug Effect and Control Response Rate," Journal of Comparative and Physiological Psychology 86 (1974), pp. 1031-43. 


1 2 3 4 5 6 7 
1 1 1 1 1 1 .81 
2 1 1 1 2 1 .80 
3 1 1 1 3 1 82 
190 2 24 3 2 2 2.98 
191 2 24 3 3 2 2.47 
192 2 24 3 4 2 1.51 


Appendix 


D.1 Rule 


Rules for Developing 
ANOVA Models and ‘Tables 


for Balanced Designs 


In this appendix, we present and illustrate rules for developing models for nested and/or 
crossed factor designs, for finding the appropriate sums of squares and degrees of freedom 
for the needed mean squares, and for finding the expected values of the mean squares. The 
rules in Sections D.1—D.3 apply to all balanced designs with two or more replications and 
with no interactions assumed to equal zero. The rule modifications in Section D.4 show 
how these rules need to be modified to make them applicable to balanced designs with no 
replications and/or with some interaction terms assumed to equal zero. 

As noted earlier, a design is balanced in the nested case when (1) the number of factor 
levels of a nested factor is the same for each level of the factor in which the nesting 
takes place, and (2) the number of replications is constant for the different factor level 
combinations. In the crossed case, a design is balanced whenever the number of replications 
is constant for all factor level combinations. In a subsampling design, balance requires that 
the subsample sizes at each stage of sampling be constant. 


for Model Development 


Rule (D.1) 


1358 


We begin by presenting a rule for the development of a nested and/or crossed factor design 
model. This rule is applicable when no interactions are assumed to equal zero. We shall 
utilize as an illustration the training school example of Table 26.1, where the effects of three 
schools (factor A) and two instructors within each school (factor B) were studied and two 
replications were made in each instance. 


a 
Step 1. Include an overall constant and a main effect term for each factor, taking into 
account when one factor is nested within another. 


Example For the training school example, we include: 
Hn 0 Bio 
Note that factor B is nested within factor A. 


Appendix D Rules for Developing ANOVA Models and Tables for Balanced Designs 1359 


Step2. Include all interaction terms except those containing both a nested factor and the 
factor within which it is nested. 


Example Since factor B is nested within factor A, the AB interaction term (the only 
possible interaction term here) is not included. 


Step 3. Interactions between a nested factor and another factor with which the nested 
factor is crossed are always themselves nested. 


Example For the training school example, this situation does not arise. 


Step 4. Include the error term, which is nested within all factors. 

Since the model formulation will be used for developing the needed ANOVA sums of 
squares, degrees of freedom, and expected mean squares, we now need to recognize that 
the error term € is nested within a factor level combination. That is, the kth experimental 
unit when factor A is at Jevel 1 and factor B is at level 1 is not the same unit as the kth 
experimental unit for another factor level combination. 


Example For the training school example the error term is ёк уу, and the appropriate 
model therefore is: 
Үк = ш. t о + By + Ekan 


. i (D.2) 
i=1,2,3; j=1,2; k=1,2 


D.2 Rule for Finding Sums of Squares and Degrees of Freedom 


This rule is applicable to all balanced designs with two or more replications and with no 
interaction terms assumed to equal zero. We shall continue to consider the training school 
example where factor B is nested within factor A. It does not matter for this rule whether 
the factor effects are fixed or random. 


Rule (D.3) for Definitional Forms of Sums of Squares 
and Degrees of Freedom 
Step 1. Write the model equation. 
Example The model equation for the training school example was given earlier. We show 
this model now in its general form, where factor A has a levels, factor B has b levels, and 
there are n replications: 
Yi = и. + oi + В t ёкар) 


D.2a 
i-l...,a; j-—lk..,bh k=1,...,n ( ) 


Step 2. For each model term other than the overall constant, write the associated SS 
notation. 


Example We do this for the training school example in columns 1 and 2 of Table D.1 for 
Qi, Вау, and egaj- The line for Total will not be completed until step 9. 


Step 3. Each sum of squares will have as coefficient the product of the limits of the 
subscripts not appearing in the model term. The coefficient is taken to be 1 if all subscripts 
appear in the model term. 


1360 Appendices 


TABLE D.1 Derivation of Sums of Squares Formulas for Nested Two-Factor Experiment (B nested within A). 


0 о G) (4) (5) (6 (7) (8) 
Model Symbolic Term to Sum of Degrees 
Term 55 Coefficient У, Product Be Squared Squares of Freedom 
oj SSA bn У 1—1 Y.—Y. — bnYXY. – Y. a—1 


1 


Bio 55КА) п УУ) ї@—-1)у=]—1  Y-Ye  nYYXW-Y-? —a(b-1) 
1 j i j 

ewp 55Е т XXX kK-Dij=ijk-ij Y= DEZ- ab(n—1) 
i- j k i j k : 


Total. SSTO Yk—Y. Ууу je — К)” abn—1 
i j k 


Example The coefficients for our example are shown in column 3 of Table D.1. For 
instance, œ; does not contain j and k. These subscripts have limits of b and n, respectively. 
The coefficient for the SSA term is therefore bn. Since the model term єр; contains all 
subscripts, the coefficient 15 taken to be 1 here. 


Step 4. Each sum of squares is summed over all of the subscripts of the model term, 
whether in parentheses or not. 


Example The summations for our example are shown in column 4. For instance, the sum 
of squares term corresponding to о; is summed over i, the only subscript in that model term. 
Similarly, the sum of squares term corresponding to ёде) is summed over i, j, and k since 
all of these appear in the model term. 


Step5. Forma symbolic product from the subscripts of the model term, using the subscript 
if it is in parentheses, and the subscript minus 1 if it is not in parentheses. Expand the product. 


Example The symbolic products for our example are shown in column 5. For instance, 
for o; the symbolic product is i — 1. For Вз, the symbolic product is i(j — 1) = ij — i. 
For £j), the symbolic product is (k — 1)ij = ijk — ij. 
Step 6. The typical term to be squared consists of means of the observations with the 
subscripts consisting of the symbolic product term and dots elsewhere. The sign of each 
mean is that of the symbolic product. A 1 refers to the overall mean. 


Example The terms to be squared for our example are shown in column 6. Note that for 
a;, the symbolic product is i — 1, and the typical term to be squared therefore is: 


Y y. 


For В; the symbolic product is ij — i, and hence the typical term to be squared is: 


Appendix D Rules for Developing ANOVA Models and Tables for Balanced Designs 1361 


Similarly, for єр, the symbolic product is ijk — ij. Hence the typical term to be squared is: 
Yii — Yi. 
Note that we write the first term as Y;;, since it is not averaged over any subscript. 


Step 7. Combining the steps of squaring, summing, and multiplying by the coefficient 
yields the appropriate sums of squares. 


Example The sums of squares for our example are shown in column 7. 


Step 8. The degrees of freedom are obtained by replacing in each symbolic product the 
subscript variable by its limit. 


Example For our example, the degrees of freedom are shown in column 8. For instance, 
for о; the symbolic product is i — 1; hence df = a — 1. Similarly for єр), the symbolic 
product is ijk — ij; hence df = abn — ab = ab(n — 1). 


Step9. The total sum of squares is always defined as the sum, over all observations, of the 
squared deviations of the observations from the overall mean. The total degrees of freedom 
are always defined as one less than the total number of observations. 


D.3 Rule for Finding Expected Mean Squares 


Rule (D.4) 


The rule for finding expected mean squares that we shall now present enables us to avoid 
tedious derivations. The rule applies to both nested factors and crossed factors. The rule is 
applicable to all balanced designs with two or more replications and with no interaction 
terms assumed to equal zero. We continue to use the training school example of Table 26.1 
as our illustration. Here factor A (school) and factor B (instructor) are both fixed factors, 
factor B is nested within factor A, factor B has b levels within each level of factor A, 
factor A has a levels, and there are n replications. 


The rule for finding expected mean squares to be presented may appear to be a bit complex 
on first reading. However, with a little practice the desired expected mean squares can be 
obtained very quickly and easily. 


Step 1. List the model equation. 


Example The model equation is that of (D.2a): 
Yijk = p. + i + Ву + £j) 


Step2. For each term other than the overall constant, write the associated random effects 
variance term. 


Example 0; Вид ёқа 


2 2 2 
с оў с 


1362 Appendices 


If factors have fixed effects, as in this example, we shall at the end replace these variance 
terms by sums of squared effects divided by degrees of freedom. For instance, in the training 
school example the term o? later will be replaced by У ^ 07 /(a — 1), and likewise of will be 
replaced by 3^ 5^ Bio /a(b — 1). In the meantime, however, it is easier to write the variance 
term rather than a sum of squared effects divided by degrees of freedom. 


Step 3. Set up a table, with the rows consisting of the model elements other than the 
overall constant. 


Example 


о 
Bjo 


EKij) 


Step 4. The column headings for the table are the subscripts in the model. Under each 
heading, write F if the factor indexed by the subscript is fixed, and write R if it is random. 
Also write the number of levels for that factor. 


Example 
i j 
F F R 
a b n 
o 
Bio 


EKij) 
For instance, i refers to school, a fixed factor that occurs at a levels. Note that the 
subscript k refers to replication, which is a random "factor" and occurs at n levels. 


Step 5. In each row where one or more subscripts are in parentheses, enter a ] in the 
column(s) corresponding to the subscript(s) in parentheses. 


Example 
i ј ` k 
F F R 
a b n 
ei 
Bio 1 
Exif) 1 1 . 


Thus, in the B; row, we enter a 1 in the ; column, and so on. 


Step 6. In each row where one or more subscripts are not in parentheses, enter in the 
column(s) corresponding to the subscript(s) not in parentheses a 1 if the subscript refers to 
a random factor, and a 0 if the factor is fixed. 


Appendix D Rules for Developing ANOVA Models and Tables for Balanced Designs 1363 


Example 
i 
F F R 
a b n 
Qj 0 
Bj 1 0 
Eki p 1 1 1 


Thus, for the Sja) row, the subscript not in parentheses is j, which refers to factor B, a 
fixed factor. Hence, а 0 is entered in the j column. 


Step7. Fill in all remaining empty cells with the number of levels appearing in the column 
heading. 


Example 

PNE 

F F R 

а b n 
Qi 0 b n 
Вк 1 0 n 
Єк) 1 1 1 


Each Е{М5} will consist of a linear combination of the variance terms enumerated in 
step 2, with the coefficients obtained by taking additional steps in the table just completed. 
Some of the coefficients may be zero, which means that the corresponding variance term is 
not present in the E(MS]. 


Step 8. Adjoin on the right of the table just completed the variance term associated with 
the effect in that row. In addition, adjoin a column for each expected mean square to be 
found. Under each expected mean square, indicate all of the subscripts (including any 
parentheses) associated with the corresponding model term. 


Example 
bes J 
F F R E {MSA} E {MSB(A)} E{MSE} 
a b n Variance i (Dj pk 
о 0 b n oz 
Bio 1 0 n оў 
екі) 1 1 1 о? 


Note that all of the subscripts of the associated model term, whether in parentheses or 
not, are shown under the expected mean square. For example, E{MSB(A)} has associated 
with it the model term Ву, so that the subscripts shown are (i) and j. Similarly, E {MSE} 
has associated with it the model term ёр), so that (/7) and К are shown. 


1364 Appendices 


Step 9. For each expected mean square column, the coefficient of any variance term is 
zero if the subscript(s) of the model term in that row (whether in parentheses or not) do not 
include all of the subscript(s) in the heading of that E{MS} column (whether in parentheses 
or not). 


Example 

i j k 

F F R E(MSA) . E(MSE(A) Е{М5Е} 

a b n Variance i (7) ј Gk 
а; 0 b oz 0 0 
Bjq) 1 0 оў 0 
Ex(ij) 1 1 1 о? 


For the E{MSA} column, it will be noted that the model terms in all rows contain the 
subscript i. Hence, none of the variances receives a zero coefficient as a result of this step. 

For the E {MSB(A)} column, note that the first row has a model term not containing both 
i and j. Hence, o? receives a zero coefficient in the E{MSB(A)} column. 

Finally, for the E {MSE} column, the first and second rows have model terms that do not 
contain the three subscripts i, j, and k. Hence, both 02 апа ор receive zero coefficients in 
the E(MSE) column. 


Step 10. The coefficients of the variance terms that have not been assigned a zero coeffi- 
cient as a result of step 9 are found as follows: 


a. For each expected mean square column, delete (e.g., mask or cover) the column(s) on 
the left corresponding to the subscripts not in parentheses in the heading of the E{MS} 
column. 

b. Multiply the entries in the remaining columns for each row being considered. 


Step 11. The expected mean square equals the sum of the products of each coefficient 
times the associated variance term, with the variance terms for fixed effects replaced by 
sums of squared effects divided by degrees of freedom. 


Example 
i j k 
F F R E(MSA)  E{MSB(A)} E {MSE} 
a b n Variance i ()j (ipk 
oj 0 b n о? bn O(step9) ^" 0 (ѕіер 9) 
Bio 1 0 п оў 0 п 0 (step 9) 
Єк) 1 1 1 c? 1 1 1 


To find the coefficients for the E {MSA} column, for example, we noted earlier that no 
zero coefficient is assigned as a result of step 9. Step 10a calls for column i on the left to 


Appendix D Rules for Developing ANOVA Models and Tables for Balanced Designs 1365 


be deleted. Hence, we obtain by multiplying the terms in the j and k columns: 


J 

F R E{ MSA} 

b n Variance i 
Qj b n ož bn 
Bio 0 п оў 0 
EKG j) 1 1 О 2 1 


Thus: 
E{MSA} = bno; + (0)05 + (1)? = bno; + o? 
Since factor A has fixed effects, we finally obtain: 
2 
E(MSA] = prè“ to? 
a — 


We find the remaining coefficients for E (MSB(A)) in similar fashion. We delete column j 
on the left, the subscript not in parentheses, and obtain: 


i k 
F R E(MSB(A)) 
а п Магїапсе (ї)] 
о 0 п o2 0 (step 9) 
В jG) 1 n [e] 2 п 
kG p 1 1 eo zi 1 


Thus: 

E(MSB(A)) = (0)02 + пор + (0? = пор + o? 
Since factor B has fixed effects, we finally obtain: 
ЭЭЭ 
a(b — 1) 


To find the remaining coefficient in the E(MSE] column, we delete column К, and the 
product on the o? lineis 1 - 1 = 1. Thus: 


E{MSE} = (0)о2 + (0)o + (1)о? = o? 


Assembling our results, we have: 


E(MSB(A)) = n To 


2 
E{MSA} = p 24 +o? (D.5a) 
2 
E(MSB(A)) — nr bio +o? (D.5b) 
a(b — 1) 


Е{МЅЕ} = o? (D.5c) 


1366 Appendices 


Comment 
Some computer packages provide the expected mean squares for any balanced ANOVA study. An 
example is shown in Figure 26.7. a 


D.4 No Replications and/or Some Interactions Equal Zero 


Modification of Rules 
When a balanced design includes no replications and/or some interactions are assumed 
to equal zero—as, for instance, in a randomized complete block design with fixed 
block effects—rules (D.1) and (D.3) need to be modified slightly. Rule (D.4) requires no 
modification. 
The modification of rule (D.1) is very slight. Step 2 now becomes: 


Rule (D.1) modification: Step 2. Include all interaction terms except those 
assumed to equal zero and those containing both a nested factor and the (0.6) 
factor within which it is nested. 


The modification of rule (D.3) is also a simple one: 


Rule (D.3) modification: Steps 2 through 8 do not apply to the model error 

term £. Instead, the sum of squares associated with the model error term £ is 

obtained as a remainder from the total sum of squares. Likewise, the degrees. (0.7) 
of freedom associated with this remainder sum of squares are obtained as a 
remainder from the total degrees of freedom. 


The sum of squares associated with the model error term = in balanced designs where 
there are no replications and/or where some interaction terms are assumed to equal zero 
will be denoted by SSRem, which stands for the remainder sum of squares. Frequently, 
the remainder sum of squares will turn out to be an interaction sum of squares for the 
interaction terms in the mode] that are assumed to equal zero. The remainder mean square 
will be denoted by MSRem. 


Additional Modification for Latin Square Designs 
For latin square design model (28.12), one of the subscripts in Y;;, is redundant since the 
row and column indices defipe the treatment for a given latin square design. Hence, when 
using the rules presented in the case of a latin square design, one of the subscripts must be 
treated as redundant, i.e., it needs to be ignored. 


D.5 Additional Examples of Use of Rules 


Crossed Two-Factor Study—Mixed Factor Effects 


Consider atwo-factor experiment in a completely randomized design, where factors A and B 
are crossed, factor A has fixed effects and factor B has random effects, and n replications 
are obtained for each factor combination. The model equation 15 that of (25.42): 


Yi = p. +0; + Bj + (038); + xa 
where we now recognize the nesting of the error term €. 


a 


Appendix D Rules for Developing ANOVA Models and Tables for Balanced Designs 1367 


Table D.2 contains the derivation of the sums of squares. Table D.3a contains the pre- 
liminary tabulations for finding the expected mean squares, while Table D.3b presents the 
results of steps 9 and 10 of rule (D.4). The random effects variance terms corresponding to 
the model terms are: 


Qi Bj (o. 8)ij Ekaj) 
oa оў O28 c? 

Here, only the o; are fixed effects, so at the end o2 will need to be replaced by a sum of 
squared effects divided by degrees of freedom. Note in Table D.3b that for finding E{MSA}, 
оў receives a zero coefficient as a result of step 9 since the subscript in the 8; model term 
does not contain the subscript i in the E{MSA} column. Column i is deleted for step 10 for 
finding the coefficients in the E(MSA) column since it is the only subscript in the column 
heading and is not in parentheses. The other expected mean squares coefficients are found 
in similar fashion. Table D.3b indicates for each expected mean square whether the zero 
coefficients are obtained from step 9, and also which columns are deleted. The final expected 
mean squares, presented in Table D.3c, are identical to those shown in Table 25.5. 


Subsampling in Randomized Block Design 
The model usually employed for a randomized block design when only a single observation 
is made on an experimental unit is ANOVA model (21.1) in the case of fixed treatment and 
block effects: 


Yi; = Ma + +7) + &j (D.8) 


We shall now consider a slightly more complex case, namely when subsampling is used in 
a randomized block design—that is, when more than one observation is made on each 
experimental unit. Consider, for instance, an experiment to study how three different 
motivational stimuli affect the length of time a person requires to perform a task. The 
persons in the experiment are blocked into groups of three, according to age, and each 
person is assigned at random one of the three motivational stimuli. Three observations are 
then made on the time required to complete the task; that is, the subject is asked to perform 
the same task three times. 

In this type of situation, we simply add a random observation error component to ANOVA 
model (D.8). Assuming that the treatment and block effects (motivational stimuli and age 
groups in our example) are fixed, an appropriate model is: 


Yi = p. + pi t T; + Eg + na (D.9) 
where: 
> о =0 
X tj =0 


£j) and ngaj are independent normal random variables with expectations 0 and variances 
о? and c7, respectively 


i—1l,..., Api Jul k=1,...,m 


89Ғ1 


TABLE р. 2 Derivation of Sums of eco кошш for Crossed Шага Ена in Completely Randomized реш 


(0 
Model: 
“Term: 


à. 
B 
(o£); 
Єй) 


Total. 


‚ @ 
ss 


SSA 
558 
$5АВ 


SSE 


STO 


(3) a “(Буз (6) © бу. (8) 

К Symbolic Termto | Degrees 
Coefficient ' Product Be Squared: Sum:of Squares of Freedom 
bn Бе Fie- Ys bri Xs. — 3 a 

LAN 
an fed y. any; X b-a 
n be (Q-0G-1)sij-1-j-1 fij EN zoe "Уй: = Yu = Yje + (a~1)(b — 1) 
j 
1 


Yij Yen SEM YP abn —1 
TTE to 


TABLE D.3 
EIMS} 
Derivations for 
Crossed 
Two-Factor 
Experiment 
(A fixed, B 
random). 


Appendix D Rules for Developing ANOVA Models and Tables for Balanced Designs 1369 
(a) Table 
i j, 
F R R 
g а b n 
E 0 b n 
dj @ 1 n 
(08) 0.1 n 
шу T 1 1 
6) Coefficients 
u E(MSA) "CE(MSB] E[MSAB) 
Variance PC j ij | 
©? b.n 0 (step 9) 0 (step 9) 0 (step 9) 
of: 0.(step:9): af -O (step 9) 0 (step:9) 
ГА ten On n х0 (#ер'9) 
oF 1-1 Td 1 1:1 


Xi ё01, deleted) — (j.coldeleted) — (i, j cols. deleted) (k col. deleted) 


D 


E(MSA} = brin + пору о? 


ҢМ5В} = ano? +0? 
"-E(MSAB) = 02, «To? 
E{MSE} = о? 


Here p; is the block effect, t; the treatment effect, £g; the random effect associated with 
the experimental unit, and ngaj) the random effect associated with the kth observation on 
the experimental unit. Note that the experimental error € is nested within the (ij) block- 
treatment combination; there is no additional subscript since only one experimental unit 
is assigned to a treatment within a block. Thus, there are no replications for experimen- 
tal units. Also note that the observation error 7 is nested within the (77) block-treatment 
combination. 

Since there are no replications present and the block-treatment interactions are assumed to 
equal zero, we need to usethe modified rules, as explained in Section D.4. Table D.4 contains 
the derivation of the sums of squares for ANOVA model (D.9), and Table D.5 contains the 
derivation of the expected mean squares. Note that the sum of squares for experimental units 
is obtained as a remainder in Table D.4 because there is only one experimental unit assigned 
to a treatment within a block. As expected, SSRem turns out to be the block-treatment 
interaction sum of squares, as for a randomized block design without subsampling. 


1370 Appendices 


TABLE D.4 Derivation of Sums of Squares Formulas for Randomized Block Design 
with Subsampling—ANOVA. Model (D.9). 


Model Term | Symbolic Product “Sum of Squares. TE Degrees of Freedom: 
p i-1 SSBL = rm Sa (Ys — Ү..)” fip —T 
tj j-1 SSR = nyri Y e Hes)? pet 
&j | SSRem = SSBL. 7. » Rémainder 
| SMI. — ha YF = (my —1)(r =1) 
тка (k—19ij 2 ijk— $SOE — 3^ £ Уи - Yije) miim 1) 
Total É 5570= ууу, Eir- Y 2 лыт=1 
TABLE D.5 qu NLIS 
Derivation of (a) Table. — —— 
Expected Mean Expected Mean:Square of 
Squares for Е EN T TDI On MG EUNT нел 
Randomized i ECC e „Жет. OF. 
Block Design F R | M E 
With nb m tM арк 
шй —— MÀ АИЫ 
ANOVA Model 2 9 m % rii 0 0 
(D.9). т] Nb one oll 2 0 nom: 0 0 
Ж ИЕ И d 
ean 1 m - E ? m jf 0 
E EM NE EE 
(byExpected'MeanSquares — 
—— 
: Л . 
E{MSBL} = rm Zei - mo? 4 o2 
: i Np — 1 eum 
` ЭЕ 
E(MSTR) = пт: 1 +mo? Fo оў 
-E{MSRem} = mo? + o? 
E(MSOE) =o} 


Table D.5b indicates that for ANOVA model (D.9) with fixed treatment and block effects, 


the test statistic for examining the presence of treatment effects is F* 


= MSTR/MSRem, 


as is also the case when no subsampling occurs in a randomized complete block design— 
see (21.7b). Remember that MSRem denotes simply the interaction mean square MSBL. TR 
here. 


Problems 


D.1. 


D.2. 


D.3. 


D.4. 


D.5. 


D.6. 


D.7. 


D.8. 


D.9. 


D.10. 


Appendix D Rules for Developing ANOVA Models and Tables for Balanced Designs 1371 


Refer to ANOVA model (25.39). Use rule (D.4) to obtain the expected mean squares in 
Table 25.5 for this model. 


Refer to ANOVA model (25.77). 

a. Userule (D.3) to obtain the sums of squares formulas in (24.22) and the associated degrees 
of freedom. 

b. Use rule (D.4) to obtain the expected mean squares in Table 25.9. 

Refer to ANOVA model (25.79). 

a. Use rule (D.3) to obtain the sums of squares formulas in (24.22) and the associated degrees 
of freedom. 

b. Use rule (D.4) to obtain the expected mean squares in Table 25.10. 

Refer to nested design model (26.7), but assume that factor A is nested within factor B, 

factor A effects are random, and factor B effects are fixed. (See also "Random Factor Effects" 

on page 1093.) 

a. Userule(D.3) to obtain the sums of squares formulas and the associated degrees of freedom. 

b. Use rule (D.4) to obtain the expected mean squares. 

c. Whatistheappropriate mean square to be used in constructing a confidence interval for pi. ;? 

Refer to randomized complete block model (21.1). 

a. Use rule (D.3) and modification (D.7) to obtain the sums of squares formulas in (21.6) and 
the associated degrees of freedom. 

b. Use rule (D.4) to obtain the expected mean squares in Table 21.2 for this model. 

Refer to randomized complete block model (21.1), but assume that treatment effects are 

random. (See also Comment 2 on page 897.) 

a. Use rule (D.3) and modification (D.7) to obtain the sums of squares formulas in (21.6) and 
the associated degrees of freedom. 

b. Use rule (D.4) to obtain the expected mean squares in Table 21.2 for this model. 

Refer to randomized complete block model (25.67). 

a. Use rule (D.3) and modification (D.7) to obtain the sums of squares formulas in (21.6) and 
the associated degrees of freedom. 

b. Use rule (D.4) to obtain the expected mean squares in Table 25.8 for this model. 

Refer to randomized complete block model (D.9), but assume that block effects are random. 

a. Use rule (D.3) and modification (D.7) to obtain the sums of squares formulas and the 
associated degrees of freedom. 

b. Use rule (D.4) to obtain the expected mean squares. 

In a balanced three-factor study, factors A and C are crossed and factor B is nested within 

factor C. Factor A has fixed effects, and factors B and C have random effects. There are n 

replications for each treatment. 

a. Userule (D.3) toobtain the sums of squares formulas and the associated degrees of freedom. 

b. Use rule (D.4) to obtain the expected mean squares. 

c. What is the appropriate denominator mean square for testing for factor A main effects? 

Swimmer motivation. A large metropolitan swim club for youths studied the effects of three 


motivational stimuli on performance. The three motivational stimuli were: (1) presentation of 
merit award, (2) granting of team leadership privileges, and (3) publicity in the club newsletter. 


1372 Appendices 


D.12. 


D.13. 


D.14. 


Since age is known to be related to performance, the nine female swimmers included in the 
study were grouped according to age into three blocks of three each. Within each age block, the 
three swimmers were randomly assigned to one of the motivation treatments. After a suitable 
amount of training, each swimmer was timed on three separate occasions while swimming a 
fixed distance. The coded data on the time for each of the three trials follow. 


Motivation Treatment 


j=1 j=2 j=3 

Block Observation Merit Award Leadership Publicity 
i=1 k=1: 28 26 27 
(7—8 years) k=2: 32 24 29 

k=3: 31 27 30 
(9-10 years) k=2: 26 19 21 

k=3: 23 18 22 
i=3 k=1: 18 13 17 
(11-12 years) k=2: 21 16 19 

k=3: 20 15 19 


Obtain the residuals for randomized block model (D.9) and plot them against the fitted values. 
Also prepare a normal probability plot of the residuals. What are your findings about the 
appropriateness of model (D.9)? 


. Refer to Swimmer motivation Problem D.10. Assume that randomized block model (D.9) 


with fixed block and treatment effects is appropriate. 

a. Obtain the analysis of variance table. 

b. Test whether or not the mean times are the same for the three motivational stimuli; use 
a = .05. State the alternatives, decision rule, and conclusion. What is the P-value of the 
test? 

c. Make all pairwise comparisons among the three treatment means; use the Tukey procedure 
with а 90 percent family confidence coefficient. State your findings. 

d. Obtain point estimates of o? and c7. Does one variance appear to be much larger than the 
other? Discuss. 


Refer to repeated measures model (27.21). Consider a simpler model, in which interactions 

SA and SB are not present. The parameters pL... , Pi» Œj, Bx, (@B)jx, and вк are defined in the 

same way as (27.21). 

a. Use rule (D.3) and modification (D.7) to obtain the sums of squares formulas similar to 
those in Table 27.11b and the associated degrees of freedom similar to those in Table 27.1 1a. 

b. Use rule (D.4) to obtain the expected mean squares similar to those in Table 27.1 1a. 

Refer to repeated measures model (27.11). 

a. Use rule (D.3) and modification (D.7) to obtain the sums of squares formulas and the 
associated degrees of freedom in Table 27.5. 

b. Use rule (D.4) to obtain the expected mean squares in Table 27.6. 

Refer to the Drug effect experiment data set. Consider the combined study. Assume that 

subjects (rats) and observation units have random effects, and that factor A (initial lever press 

rate), factor B (dosage level), and factor C (reinforcement schedule) have fixed effects. Also 

assume that Шеге are no interactions between subjects and treatments. 


D.15. 


D.16. 


D.17. 


Appendix D Rules for Developing ANOVA Models and Tables for Balanced Designs 1373 


a. Use rule (D.1) and modification (D.6) to develop the model for this experiment. 

b. Use rule (D.3) and modification (D.7) to obtain the sums of squares formulas and the 
associated degrees of freedom. 

c. Use rule (D.4) to obtain the expected mean squares. 

Derive the expected mean squares in Table 28.5 for latin square model (28.12) by using rule 

(D.4). (See also “Additional Modification for Latin Square Designs" on page 1366.) 

Derive the expected mean squares for latin square model (28.27) with п replications by using 

rule (D.4). (See also *Additional Modification for Latin Square Designs" on page 1366.) 

Derive the expected mean squares in Table 28.10 for latin square cross-over model (28.29) 

with п subjects for each treatment order pattern by using rule (D.4). (See also "Additional 

Modification for Latin Square Designs" on page 1366.) 


Appendix 


Selected Bibliography 


The selected references are grouped into the following categories: 


. General regression books 

. General linear models books 

. Diagnostics and model building 

. Statistical computing 

. Nonlinear regression 

. Miscellaneous regression topics 

. General experimental design and analysis of variance books 

. Miscellaneous experimental design and analysis of variance topics 


со чол Бо мн 


1. General Regression Books 


1374 


Allison, P. D. Multiple Regression: A Primer. Thousand Oaks, Calif.: Sage Publications, 
1999. 

Bowerman, B. L., and К. T. O'Connell. Linear Statistical Models: An Applied Approach. 
2nd ed. Boston: Duxbury Press, 1990. 

Chatterjee, S.; A. S. Hadi; and B. Price. Regression Analysis by Example. 3rd ed. 

New York: John Wiley & Sons, 1999. 

Cohen, J.; P. Cohen; S. G. West; and L. S. Aiken. Applied Multiple Regression/ 
Correlation Analysis for the Behavioral Sciences. 3rd ed. Hillsdale, N.J.: Lawrence 
Erlbaum Associates, 2003. 

Cook, R. D., and S. Weisberg. Applied Regression Including Computing and Graphics. 
New York: John Wiley & Sons, 1999. 

Daniel, C., and F. S. Wood. Fitting Equations to Data: Computer Analysis of Multifactor 
Data. 2nd ed. New York: John Wiley & Sons, 1999. 

Draper, N. R., and H. Smith. Applied Regression Analysis. 3rd ed. New York: John Wiley 
& Sons, 1998. 

Freund, К. J., and К. C. Littell. SAS System for Regression. 3rd ed. New York: John Wiley 
& Sons, 2000. 


Appendix E Selected Bibliography 1375 


Graybill, F. A., and H. Iyer. Regression Analysis: Concepts and Applications. Belmont, 
Calif.: Duxbury Press, 1994. 

Hamilton, L. C. Regression with Graphics: A Second Course in Applied Statistics. Pacific 
Grove, Calif.: Brooks/Cole Publishing, 1992. 

Kleinbaum, D. G.; L. L. Kupper; K. E. Muller; and A. Nizam. Applied Regression 
Analysis and Other Multivariate Methods. 3rd ed. Belmont, Calif.: Duxbury Press, 1998. 
Mendenhall, W., and T. Sincich. A Second Course in Business Statistics: Regression 
Analysis. 5th ed. Upper Saddle River, N.J.: Prentice Hall, 1996. 

Muller, K. E., and B. A. Fetterman. Regression and ANOVA: An Integrated Approach 
Using SAS Software. New York: John Wiley & Sons, 2003. 

Myers, R. H. Classical and Modern Regression with Applications. 2nd ed. Boston: 
Duxbury Press, 1990. 

Pedhazur, E. J. Multiple Regression in Behavioral Research. 3rd ed. Belmont, Calif.: 
Duxbury Press, 1997. 

Rawlings, J. О; S. б. Pantula; and D. A. Dickey. Applied Regression Analysis: A Research 
Tool. New York: Springer- Verlag, 1998. 

Ryan, T. P. Modern Regression Methods. New York: John Wiley & Sons, 1997. 

Seber, С. A. Е., and A. S. Lee. Linear Regression Analysis. 2nd ed. New York: John Wiley 
& Sons, 2003. 

Sen, A., and M. Srivastava. Regression Analysis: Theory, Methods, and Applications. 

4th ed. New York: Springer-Verlag, 1997. 


2. General Linear Models Books 


Graybill, F. A. Theory and Application of the Linear Model. Boston: Duxbury Press, 1976. 
Hocking, R. R. Methods and Applications of Linear Models: Regression and the Analysis 
of Variance. 2nd ed. New York: John Wiley & Sons, 2003. 

Littell, R. C.; W. W. Stroup; and R. J. Freund. SAS System for Linear Models. 4th ed. New 
York: John Wiley & Sons, 2002. 

Searle, S. R. Linear Models. New York: John Wiley & Sons, 1997. 

Searle, S. R. Linear Models for Unbalanced Data. New York: John Wiley & Sons, 1987. 


3. Diagnostics and Model Building 


Allen, D. M. “Mean Square Error of Prediction as a Criterion for Selecting Variables." 
Technometrics 13 (1971), pp. 469—75. 

Anscombe, F. J., and J. W. Tukey. “The Examination and Analysis of Residuals.” 
Technometrics 5 (1963), pp. 141—60. 

Atkinson, A. C. “Two Graphical Displays for Outlying and Influential Observations in 
Regression." Biometrika 68 (1981), pp. 13—20. 


1376 Appendices 


Atkinson, A. C. Plots, Transformations, and Regression. Oxford: Clarendon Press, 

1987. 

Barnett, V., and T. Lewis. Outliers in Statistical Data. 3rd ed. New York: John Wiley & 
Sons, 1994. 

Belsley, D. A. Conditioning Diagnostics: Collinearity and Weak Data in Regression. New 
York: John Wiley & Sons, 1991. 

Belsley, D. A.; E. Kuh; and R. E. Welsch. Regression Diagnostics: Identifying Influential 
Data and Sources of Collinearity. New York: John Wiley & Sons, 1980. 

Box, G. E. P., and D. К. Cox. “An Analysis of Transformations.” Journal of the Royal 
Statistical Society B 26 (1964), pp. 211-43. 

Box, С. E. P., and N. К. Draper. Empirical Model-Building and Response Surfaces. New 
York: John Wiley & Sons, 1987. 

Box, С. E. P., and P. W. Tidwell. “Transformations of the Independent Variables." 
Technometrics 4 (1962), pp. 531—50. 

Breiman, L., and P. Spector. “Submodel Selection and Evaluation in Regression: The 
X-Random Case.” International Statistical Review 60 (1992), pp. 291-319. 

Breusch, T. S., and A. К. Pagan. “A Simple Test for Heteroscedasticity and Random 
Coefficient Variation" Econometrica 477 (1979), pp. 1287—94. 

Brown, M. B., and A. B. Forsythe. “Robust Tests for Equality of Variances,” Journal of 
the American Statistical Association 69 (1974), pp. 364-67. 

Carroll, R. J., and D. Ruppert. Transformation and Weighting in Regression. New York: 
Chapman & Hall, 1988. 

Chatterjee, S., and A. S. Hadi. Sensitivity Analysis in Linear Regression. New York: John 
Wiley & Sons, 1988. 

Conover, W. J.; M. E. Johnson; and M. M. Johnson. “А Comparative Study of Tests for 
Homogeneity of Variances, with Applications to the Outer Continental Shelf Bidding 
Data." Technometrics 23 (1981), pp. 351—61. 

Cook, R. D. "Exploring Partial Residual Plots." Technometrics 35 (1993), pp. 351—62. 
Cook, R. D., and S. Weisberg. “Diagnostics for Heteroscedasticity in Regression.” 
Biometrika 70 (1983), pp. 1-10. 

Cox, D. R. Planning of Experiments. New York: John Wiley & Sons, 1958. 

Davidian, M., and R. J. Carroll. “Variance Function Estimation.” Journal of the American 
Statistical Association 82 (1987), pp. 1079-91. 

Durbin, J., and С. S. Watson. “Testing for Serial Correlation in Least Squares Regression. 
IL" Biometrika 38 (1951), рр. 159-78. 

Faraway, J. J. “On the Cost of Data Analysis.” Journal of Computational and Graphical 
Statistics 1 (1992), pp. 213-29. 

Flack, V. Е, and P. C. Chang. “Frequency of Selecting Noise Variables in 
Subset-Regression Analysis: A Simulation Study.” The American Statistician 41 (1987), 
pp. 84—86. 

Freedman, D. A. "A Note on Screening Regression Equations." The American Statistician 
37 (1983), pp. 152-55. 


Appendix E Selected Bibliography 1377 


Hoaglin, D. C.; F. Mosteller; and J. W. Tukey. Exploring Data Tables, Trends, and Shapes. 
New York: John Wiley & Sons, 1985. 


Hoaglin, D. C., and К. Welsch. “The Hat Matrix in Regression and ANOVA.” The 
American Statistician 32 (1978), рр. 17—22. 

Hocking, R. R. *The Analysis and Selection of Variables in Linear Regression." 
Biometrics 32 (1976), pp. 1—49. 


Hoerl, A. E., and R. W. Kennard. "Ridge Regression: Applications to Nonorthogonal 
Problems." Technometrics 12 (1970), pp. 69—82. 


Joglekar, G.; J. Н. Schuenemeyer; and V. LaRiccia. “Lack-of-Fit Testing When Replicates 
Are Not Available.” The American Statistician 43 (1989), pp. 135-43. 


Levene, H. “Robust Tests for Equality of Variances,” in Contributions to Probability and 
Statistics, ed. I. Olkin. Palo Alto, Calif.: Stanford University Press, 1960, pp. 278-92. 


Lindsay, К. M., and A, S. C. Ehrenberg. “The Design of Replicated Studies.” The 
American Statistician 47 (1993), pp. 217—28. 


Looney, S. W., and T. R. Gulledge, Jr. “Use of the Correlation Coefficient with Normal 
Probability Plots." The American Statistician 39 (1985), pp. 75—79. 

Mallows, C. L. "Some Comments on Cp.” Technometrics 15 (1973), pp. 661-75. 
Mansfield, E. R., and M. D. Conerly. “Diagnostic Value of Residual and Partial Residual 
Plots.” The American Statistician 41 (1987), pp. 107-16. 

Mantel, N. “Why Stepdown Procedures in Variable Selection.” Technometrics 12 (1970), 
pp. 621-25. 

Miller, A. J. Subset Selection in Regression. 2nd ed. London: Chapman & Hall, 2002. 


Pope, P. T., and J. T. Webster. “The Use of an F-Statistic in Stepwise Regression 
Procedures." Technometrics 14 (1972), pp. 327—40. 


Rousseeuw, P. J., and A. M. Leroy. Robust Regression and Outlier Detection. New York: 
John Wiley & Sons, 1987. 

Shapiro, S. S., and M. B. Wilk. "An Analysis of Variance Test for Normality (Complete 
Samples)" Biometrika 52 (1965), pp. 591—611. 


Snee, К. D. “Validation of Regression Models: Methods and Examples.” Technometrics 
19 (1977), pp. 415-28. 

Stone, M. “Cross-Validatory Choice and Assessment of Statistical Prediction.” Journal of 
the Royal Statistical Society B 36 (1974), рр. 111—47. 


Velleman, P. F, and D. C. Hoaglin. Applications, Basics, and Computing of Exploratory 
Data Analysis. Boston: Duxbury Press, 1981. 


4. Statistical Computing 


BMDP New System 2.0. Statistical Solutions, Inc. 
JMP Version 5. SAS Institute Inc. 
Kennedy, W. J., and J. E. Gentle. Statistical Computing. New York: Marcel Dekker, 1980. 


1378 Appendices 


LogXact 5. Cytel Software Corporation. Cambridge, Mass., 2003. 
MATLAB 6.5. 'The MathWorks, Inc. 

MINITAB Release 13. Minitab Inc. 

S-Plus 6 for Windows. Insightful Corporation 

SAS/STAT Release 8.2. SAS Institute, Inc. 

SPSS 11.5 for Windows. SPSS Inc. 

SYSTAT 10.2. SYSTAT Software Inc. 


Tierney, L. LISP-STAT: An Object-Oriented Environment for Statistical Computing and 
Dynamic Graphics. New York: John Wiley & Sons, 1990. 


59. Nonlinear Regression 


Allison, P. D. Logistic Regression Using the SAS System: Theory and Applications. New 
York: John Wiley & Sons, 1999. 

Bates, D. M., and D. G. Watts. Nonlinear Regression Analysis and Its Applications. New 
York: John Wiley & Sons, 1988. 

Begg, C. B., and R. Gray. “Calculation of Polytomous Logistic Regression Parameters 
Using Individualized Regressions.” Biometrika 71 (1984), pp. 11-18. | 

Box, М. J. “Bias in Nonlinear Estimation.” Journal of the Royal Statistical Society B 33 
(1971), pp. 171—201. 

DeVeaux, R. D.; J. Schumi; J. Schweinsberg; and L. H. Ungar. “Prediction Intervals for 
Neural Networks via Nonlinear Regression,” Technometrics 40 (1998), pp. 273—282. 
DeVeaux, R. D., and L. H. Ungar. “A Brief Introduction to Neural Networks,” 
http://www. williams.edu/Mathematics/rdeveaux/pubs.html (1996). , 

Gallant, A. R. “Nonlinear Regression.” The American Statistician 29 (1975), рр. 73-81. 
Gallant, A. R. Nonlinear Statistical Models. New York: John Wiley & Sons, 1987. 
Halperin, M.; W. C. Blackwelder; and J. I. Verter. “Estimation of the Multivariate Logistic 
Risk Function: A Comparison of Discriminant Function and Maximum Likelihood 
Approaches.” Journal of Chronic Diseases 24 (1971), pp. 125-58. 

Hartley, Н. О. “Те Modified Gauss-Newton Method for the Fitting of Non-linear 
Regression Functions by Least Squares.” Technometrics З (1961), pp. 269-80. 

Hosmer, D. W., and S. Lemeshow. “Goodness of Fit Tests for the Multiple Logistic 
Regression Model?’ Communications in Statistics A 9 (1980), pp. 1043—69. 

Hosmer, D. W., and S. Lemeshow. Applied Logistic Regression. 2nd ed. New York: John 
Wiley & Sons, 2000. 

Hougaard, P. “The Appropriateness of the Asymptotic Distributión in a Nonlinear 
Regression Model in Relation to Curvature.” Journal of the Royal Statistical Society B 47 
(1985), pp. 103-14. 

Kleinbaum, D. G.; L. L. Kupper; and L. E. Chambless. “Logistic Regression Analysis of 
Epidemiologic Data: Theory and Practice." Communications in Statistics A 11 (1982), 
pp. 485—547. 


Appendix E Selected Bibliography 1379 


Landwehr, J. M.; D. Pregibon; and A. C. Shoemaker. “Graphical Methods for Assessing 
Logistic Regression Models (with discussion)" Journal of the American Statistical 
Association 79 (1984), pp. 61—83. 


Marquardt, D. W. "An Algorithm for Least Squares Estimation of Non-linear Parameters." 
Journal of the Society of Industrial and Applied Mathematics 11 (1963), pp. 431—41. 


Menard, S. Applied Logistic Regression Analysis. Thousand Oaks, Calif.: Sage 
Publications, 1995. 

Pregibon, D. “Logistic Regression Diagnostics.” Annals of Statistics 9 (1981), pp. 705-24. 
Prentice, К. L. “Use of the Logistic Model in Retrospective Studies.” Biometrics 32 
(1976), pp. 599—606. 

Ratkowsky, D. A. Nonlinear Regression Modeling. New York: Marcel Dekker, 1983. 


Truett, J.; J. Cornfield; and W. Kannel. “А Multivariate Analysis of the Risk of Coronary 
Heart Disease in Framingham.” Journal of Chronic Diseases 20 (1967), pp. 511—24. 


6. Miscellaneous Regression Topics 


Agresti, A. Categorical Data Analysis. 2nd ed. New York: John Wiley & Sons, 2002. 
Altman, N. S. “An Introduction to Kernel and Nearest-Neighbor Nonparametric 
Regression.” The American Statistician 46 (1992), pp. 175-85. 

Berkson, J. “Are There Two Regressions?” Journal of the American Statistical 
Association 45 (1950), pp. 164-80. 

Bishop, Y. M. M.; S. E. Fienberg; and P. W. Holland. Discrete Multivariate Analysis: 
Theory and Practice. Cambridge, Mass.: MIT Press, 1975. 

Box, G. E. P. “Use and Abuse of Regression.” Technometrics 8 (1966), pp. 625-29. 
Box, С. E. P., and С. M. Jenkins. Time Series Analysis, Forecasting and Control. 3rd ed. 
San Francisco: Holden-Day, 1994. 

Breiman, L.; J. H. Friedman; R. A. Olshen; and C. J. Stone. Classification and Regression 
Trees. New York: Chapman & Hall, 1993. 

Christensen, R. Log-Linear Models and Logistic Regression. 2nd ed. New York: 
Springer-Verlag, 1997. 

Cleveland, W. S. “Robust Locally Weighted Regression and Smoothing Scatterplots.” 
Journal of the American Statistical Association 74 (1979), pp. 829-36. 


Cleveland, W. S., and S. J. Devlin. “Locally Weighted Regression: An Approach to 
Regression Analysis by Local Fitting.” Journal of the American Statistical Association 83 
(1988), pp. 596-610. 


Collett, D. Modelling Binary Data. 2nd ed. London: Chapman & Hall, 2002. 


Cox, D. R. “Notes on Some Aspects of Regression Analysis.” Journal of the Royal 
Statistical Society A 131 (1968), pp. 265-79. 


Cox, D. R. 7he Analysis of Binary Data. 2nd. ed. London: Chapman & Hall, 1989. 


1380 Appendices 


Efron, B. The Jackknife, The Bootstrap, and Other Resampling Plans. Philadelphia: 
Society for Industrial and Applied Mathematics, 1982. 

Efron, B. “Better Bootstrap Confidence Intervals” (with discussion). Journal of the 
American Statistical Association 82 (1987), pp. 171—200. 

Efron, B., and К. Tibshirani. “Bootstrap Methods for Standard Errors, Confidence 
Intervals, and Other Measures of Statistical Accuracy.” Statistical Science 1 (1986), 

pp. 54-77. 

Efron, B., and R. J. Tibshirani. An Introduction to the Bootstrap. New York: Chapman & 
Hall, 1993. 

Eubank, R. L. Nonparametric Regression and Spline Smoothing. 2nd ed. New York: 
Marcel Dekker, 1999. 

Finney, D. J. Probit Analysis. 3rd ed. Cambridge, England: Cambridge University Press, 
1971. 

Frank, I. E., and J. H. Friedman. “A Statistical View of Some Chemometrics Regression 
Tools.” Technometrics 35 (1993), pp. 109-35. 

Friedman, J. H., and W. Stuetzle. “Projection Pursuit Regression.” Journal of the 
American Statistical Association 76 (1981), pp. 817-23. 

Fuller, W. A. Measurement Error Models. New York: John Wiley & Sons, 1987. 
Gibbons, J. D. Nonparametric Methods for Quantitative Analysis. 2nd ed. Columbus, 
Ohio: American Sciences Press, 1985. 

Graybill, Е. A. Matrices with Applications in Statistics. 2nd ed. Belmont, Calif.: Duxbury 
Press, 2001. 

Greene, W. H. Econometric Analysis. Sth ed. Upper Saddle River, N.J.: Prentice 

Hall, 2003. 

Haerdle, W. Applied Nonparametric Regression. Cambridge, England: Cambridge 
University Press, 1990. ` 

Harrell, F. E. Regression Modeling Strategies: With Application to Linear Models, 
Logistic Regression, and Survival Analysis. New York: Springer-Verlag, 2001. 

Hastie, T., and C. Loader. “Local Regression: Automatic Kernel Carpentry” (with 
discussion). Statistical Science 8 (1993), pp. 120-43. 

Hastie, T. J., and R. J. Tibshirani. Generalized Additive Models. New York: Chapman 

& Hall, 1990." 

Hastie, T. J.; R. J. Tibshirani; and J. Friedman. The Elements of Statistical Learning: Data 
Mining, Inference, and Prediction. New York: Springer-Verlag, 2001. 

Hochberg, Ү., and A. С. Tamhane. Multiple Comparison Procedures. New York: John 
Wiley and Sons, 1987. 

Hogg, R. V. “Statistical Robustness: One View of Its Use in Applications Today.” The 
American Statistician 33 (1979), pp. 108-15. 

Johnson, R. A., and D. W. Wichern. Applied Multivariate Statistical Analysis. 5th ed. 
Englewood Cliffs, N.J.: Prentice Hall, 2002. 

Kendall, M. G., and J. D. Gibbons. Rank Correlation Methods. 5th ed. London: Charles 
Griffin, 1990. 


Appendix E Selected Bibliography 1381 


Lachenbruch, P. A. Discriminant Analysis. New York: Hafner Press, 1975. 


McCulloch, P., and J. A. Nelder. Generalized Linear Models. 2nd ed. New York: 
Chapman & Hall, 1980. 


Miller, R. G., Jr. Simultaneous Statistical Inference. 2nd ed. New York: Springer-Verlag, 
1991. 


Медет, J. A., and К. W. M. Wedderburn. "Generalized Linear Models." Journal of the 
Royal Statistical Society A 135 (1972), pp. 370—84. 


Pindyck, R. S., and D. L. Rubinfeld. Econometric Models and Economic Forecasts. 
4th ed. New York: McGraw-Hill, 1997. 


Satterthwaite, F. E. "An Approximate Distribution of Estimates of Variance Components." 
Biometrics Bulletin 2 (1946), pp. 110-14. 


Searle, S. R. Matrix Algebra Useful for Statistics. New York: John Wiley & Sons, 1982. 


Snedecor, G. W., and W. G. Cochran. Statistical Methods. 8th ed. Ames, Iowa: Iowa State 
University Press, 1980. 


Theil, H., and A. L. Nagar. “Testing the Independence of Regression Disturbances.” 
Journal of the American Statistical Association 56 (1961), pp. 793—806. 


T. General Experimental Design and Analysis of Variance Books 


Atkinson, A. C., and A. N. Donev. Optimum Experimental Designs. Oxford: Clarendon 
Press, 1992. 


Box, G. E. P., and N. К. Draper. Empirical Model-Building and Response Surfaces. New 
York: John Wiley & Sons, 1987. 


Box, G. E. P.; W. G. Hunter; and J. S. Hunter. Statistics for Experimenters. New York: 
John Wiley & Sons, 1978. 


Cochran, W. G., and G. M. Cox. Experimental Designs. 2nd ed. New York: John Wiley & 
Sons, 1992. 


Cook, R. D., and C. J. Nachtsheim. "Computer-Aided Blocking of Factorial and Response 
Surface Designs." Technometrics 31 (1989), pp. 339—346. 


Cox, D. R. Planning of Experiments. New York: John Wiley & Sons, 1992. 


Dean, A., and D. Voss. Design and Analysis of Experiments. New York: Springer-Verlag, 
1999. 


Fisher, R. A. The Design of Experiments. 8th ed. New York: Hafner Publishing Co., 1966. 


Hicks, C. R., and K. V. Turner. Fundamental Concepts in the Design of Experiments. 
3rd ed. New York: Holt, Rinehart and Winston, 1999. 


Hinkelmann, K., and O. Kempthorne. Design and Analysis of Experiments: Introduction 
to Experimental Design. Vol. 1. New York: John Wiley & Sons, 1994. 


Hoaglin, D. C.; F Mosteller; and J. W. Tukey. Fundamentals of Exploratory Analysis of 
Variance. New York: John Wiley & Sons, 1991. 


Hsu, J. C. Multiple Comparisons: Theory and Methods. London: Chapman & Hall, 1996. 


1382 Appendices 


Kempthorne, О. The Design and Analysis of Experiments. New York: John Wiley & Sons, 
1952. 

Kirk, К. E. Experimental Design: Procedures for the Behavioral Sciences. 3rd. ed. 
Monterey Calif.: Brooks/Cole Publishing Co., 1994. 

Lorenzen, T. J., and V. L. Anderson. Design of Experiments: A No-Name Approach. New 
York: Marcel Dekker, 1993. 

Mason, R. L.; R. F. Gunst; and J. L. Hess. Statistical Design and Analysis of Experiments 
with Applications to Engineering and Science. 2nd ed. New York: John Wiley & Sons, 
2003. 

Montgomery, D. C. Design and Analysis of Experiments. 5th ed. New York: John Wiley & 
Sons, 2000. 

Oehlert, G. W. A First Course in Design and Analysis of Experiments. New York: W. H. 
Freeman, 2000. 

Scheffé, H. The Analysis of Variance. New York: John Wiley & Sons, 1959. 

Searle, S. R.; С. Case la; and C. E. McCulloch. Variance Components. New York: John 
Wiley & Sons, 1992. ` 

Snedecor, G. W., апа W. G. Cochran. Statistical Methods. 8th ed. Ames, Iowa: Iowa State 
University Press, 1989. 

Steel, R. G. D.; J. H. Torrie; and D. A. Dickey. Principles and Procedures of Statistics: A 
Biometrical Approach. New York: McGraw-Hill, 1996. 

Winer, B. J.; D. R. Brown; and K. M. Michels. Statistical Principles in Experimental 
Design. 3rd ed. New York: McGraw-Hill, 1991. 

Wu, C. F. J., and M. Hamada. Experiments: Planning, Analysis, and Parameter Design 
Optimization. New York: John Wiley & Sons, 2000. 


8. Miscellaneous Experimental Design and Analysis 


of Variance Topics 


Beckman, R. J.; R. D. Cook; and C. J. Nachtsheim. “Diagnostics for Mixed Model 
Analysis of Variance.” Technometrics 29 (1987), pp. 413-26. 

Berger, V. W. “Pros and Cons of Permutation Tests in Clinical Trials.” Statistics in 
Medicine 19 (2000), pp. 1319-28. 

Burdick, R. K. “Using Confidence Intervals to Test Variance Components.” Journal of 
Quality Technology 26 (1994), pp. 30-38. 

Burdick, R. K., and F. A. Graybill. Confidence Intervals on Variance Components. New 
York: Marcel Dekker, Inc., 1992. 

Dunnett, C. W. “A Multiple Comparisons Procedure for Comparing Swadi 
Treatments with a Control” Journal of the American Statistical Association 50 (1955), 
p. 1096. 

Gaylor, D. W., and F. N. Hopper. “Estimating the Degrees of Freedom for Linear 
Combinations of Mean Squares by Satterthwaite’s Formula.” Technometrics 11 (1969), 
pp. 691-706. 


Appendix E Selected Bibliography 1383 


Hartley, H. O. “Testing the Homogeneity of a Set of Variances.” Biometrika, 31 (1940), 
pp. 249-55. 

Greenhouse, S. W., and S. Geisser. “On Methods in the Analysis of Profile Data.” 
Psychometrika 24 (1959), pp. 95-112. 

Hocking, R. R. “A Discussion of the Two-Way Mixed Model.” The American Statistician 
27 (1973), рр. 148-52. 

Holland, B., and Copenhaven, M. D. “An Improved Sequentially Rejective Bonferroni 
Test Procedure.” Biometrics 43 (1987), pp. 417-23. 

Holland, B., and Copenhaven, M. D. “Improved Bonferroni-Type Multiple Testing 
Procedures.” Psychological Bulletin 104 (1988), pp. 145-49. 

Hsiao, C., Analysis of Panel Data. Cambridge, England: Cambridge University Press, 
1986. 

Huynh, H., and L. Feldt. “Estimation of the Box Correction for Degrees of Freedom from 
Sample Data in the Randomized Block and Split-Plot Designs.” Journal of Educational 
Statistics 1 (1976), pp. 69-82. 

Johnson, D. E., and F. A. Graybill. “Estimation of o? in a Two-Way Classification 

Model with Interaction.” Journal of the American Statistical Association 67 (1972), 

pp. 388-94. 

Koch, G. G.; J. D. Elashoff; and I. A. Amara. “Repeated Measurements—Design and 
Analysis.” In Encyclopedia of Statistical Sciences, vol. 8, ed. S. Kotz and N. L. Johnson. 
New York: John Wiley & Sons, 1988, pp. 46-73. 

Kruskal, W. H., and W. A. Wallis. “Use of Ranks on One-Criterion Variance Analysis,” 
Journal of the American Statistical Association, 47 (1952), pp. 583—621 (corrections 
appear in Vol. 48, pp. 907-11). 

Meyer, R. K., and C. J. Nachtsheim. “The Coordinate Exchange Algorithm for 
Constructing Exact Optimal Experimental Designs.” Technometrics 37 (1995), pp. 60-69. 
Monlezun, C. J. “Two-Dimensional Plots for Interpreting Interactions in the Three-Factor 
Analysis of Variance Model,” The American Statistician 33 (1989), pp. 63—69. 

Nelson, L. S. “Exact Critical Values for Use With the Analysis of Means.” Journal of 
Quality Technology 15 (1983), pp. 40-44. 

Nelson, P. R. “Additional Uses for the Analysis of Means and Extended Tables of Critical 
Values.” Technometrics 35 (1993), pp. 61—71. 

Ott, E. R. “Analysis of Means—A Graphical Procedure.” Industrial Quality Control 24 
(1967), pp. 101—109. 

Plackett, К. L., and J. P. Burman. “The Design of Optimum Multifactorial Experiments.” 
Biometrika 33 (1946), pp. 305—25. 

Puri, M. L., and P. K. Sen. Nonparametric Methods in General Linear Models. New York: 
John Wiley & Sons, 1985. 

Satterthwaite, F. E. "An Approximate Distribution of Estimates of Variance Components 
Biometrics Bulletin 2 (1946), pp. 110—14. 

Schwarz, C. J. "The Mixed-Model ANOVA: The Truth, the Computer Packages, the 
Books. Part I: Balanced Data.” The American Statistician 47 (1993), pp. 48—59. 


» 


1384 Appendices 


Shaffer, J. P. *Modified Sequentially Rejective Multiple Test Procedures." Journal of the 
American Statistical Association 81 (1986), pp. 826-31. 

Shoemaker, A. C.; К. L. Tsui; and C. F. J. Wu. “Economical Experimentation Methods for 
Robust Design.” Technometrics 33 (1991), pp. 415-27. 

Snee, К. D. “Computer-Aided Design of Experiments—Some Practical Experiences.” 
Journal of Quality Technology 17 (1985), pp. 222-36. 

Taguchi, G. Introduction to Quality Engineering. Tokyo: Asian Productivity Organization, 
1986. 

Ting, N.; R. K. Burdick; F. A. Graybill; S. Jeyaratnam; and T. F C. Lu. “Confidence 
Intervals on Linear Combinations of Variance Components that Are Unrestricted in Sign.” 
Journal of Statistical Computation and Simulation 35 (1990), pp. 135-43. 

Welch, W. J.; T. K. Yu; S. M. Kang; and J. Sacks. “Computer Experiments for Quality 
Control by Parameter Designs.” Journal of Quality Technology 40 (1990), pp. 62-71. 


Index 


A 


ABT Electronics Corporation, 783 
Activation functions, 538 
Active explanatory factors, 1209 
Added-variable plots, 384—390 
Addition theorem, 1298 
Additive effects, 216 
Additive factor effects, 819-822 
Additive model for random block effects, 
1061—1064 
Adjusted coefficient of multiple 
determination, 226-227 
Adjusted estimated treatment mean, 932 
Adjusted variable plots (see 
Added-variable plots) 
Adjustment factors, 1250 
Akaike’s information criterion (А/С), 
359-360 
Algorithms, 361-364 
Aliased, 1225 
Aligned dot plots, 777, 778 
Allocated codes, 321-322 
America’s Smallest School: The Family, 441 
Analysis of covariance, 658, 917-920 
alternative to blocking, 939 
correction for bias, 940 
estimation of effects, 930-932, 940 
F tests, 928-929 
models 
appropriateness, 925-926 
multifactor, 934-937 
single-factor, 920-925 
two-factor, 934 
parallel slopes test, 932-933 
randomized block design, 937-938 
regression approach 
multifactor, 934—935 
single-factor, 924—925 
uses of differences, 939-940 
Analysis of means, 758-759 
Analysis of means plot, 758-759 
Analysis of variance (ANOVA), 679-681, 
833-842 
coefficient of multiple correlation, 227 
coefficient of multiple determination, 
226—227 
concomitant variables, 919—920 
degrees of freedom, 66 
empty cells, 964—967 
estimation of effects 
latin square design, 1190 
nested design, 1100-1104 
quantitative factor, 762—766 
repeated measures design, 
1137-1138, 1157-1158 


with interaction, 1148-1152 
without interaction, 1145—1148 
single-factor, 762—766 
three-factor, 1013—1017, 
1069-1070 
two-factor, 848-861, 959-964, 
970—980, 1055-1060 
two-level factorial design, 
1212-1214 
estimation of factor level means, 
737-161 
expected mean squares, 68-69 
F tests, 69-71, 226 
latin square design, 1190 
multiple pairwise testing 
procedure, 1138—1139 
nested design, 1097-1099 
power value charts, 1337—1341 
randomized block design, 
898—901, 1138-1139 
repeated measures design, 
1130-1134, 1138—1139, 
1142-1143, 1155-1157 
single-factor, 698—701, 716—718, 
744, 795—798 
single-factor ANOVA model, 704 
three-factor, 1009-1010, 
1067—1068 
two-factor, 843—847, 1053-1054 
two-level factorial design, 
1214-1215 
mean squares, 66—67, 225 
no replications and/or some 
interactions equal zero, 1366 
partitioning 
latin square design, 1188-1189 
nested design, 1093-1099 
randomized block design, 
898—900, 908—909 
repeated measures design, 
1130-1134, 1138-1139, 
1142-1143, 1155-1157 
single-factor ANOVA model, 
690-693 
three-factor, 1008—1009 
two-factor, 836-840 
partition of sum of squares, 63-66 
planning of sample size 
equal sample sizes, 759-761 
estimation approach, 759-761 
to find “best” treatment, 721-722 
latin square design, 1193—1194 
power approach, 716-723 
randomized block design, 
909—912, 939 
single-factor, 716-718 


tables, 1342-1344 
unequal sample sizes, 761 
regression approach 
randomized block design, 
967—969 
repeated measures design, 1161 
single-factor ANOVA model, 
704—712 
three-factor, 1019-1020 
two-factor, 953-959 
rule for finding expected mean 
squares, 1361-1365 
rule for finding sums of squares and 
degrees of freedom, 
1359-1361 
statistical computing packages, 
980-981 
sums of squares, 204—206, 225 
unequal sample sizes, 761, 
1019-1021, 1070-1077 
unequal treatment importance, 
970—980 
Analysis of variance (ANOVA) models, 
329, 642, 659, 679—681 
diagnosis of departures from, 778—781 
effects of departures from model, 
793—195 
factor effects model 
with unweighted mean, 705—708 
with weighted mean, 709—710 
fitting of, 685-689, 1003-1011 
meaning of model elements, 817—829 
no-interaction model, 880—886 
randomized block design, 1061—1065 
rule for model development, 
1358-1359 
single-factor, 681—685, 692—704 
model П, 1030-1034, 1047 
model I vs. model II, 685 
repeated measures design, 
1129—1139 
residual analysis for aptness, 
775—781 
subsampling, 1106—1113 
weighted least squares, 786—789 
tests for constancy of error variance, 
780—785 
three-factor 
ANOVA, model, 992—998 
fitting of model, 1003—1005 
model П, 1066 
model Ш, 1066-1067 
partially nested design, 1114—1119 
residual analysis for 
appropriateness, 1006, 
1007 
1385 


1386 Index 


Analysis of variance models (cont.) 
transformations of response variables, 
789—793 
treatment means comparisons, 
856—861 
two-factor 
crossed, 1366-1369 
estimation of effects, 848—861 
fitting of model, 834—836 
fixed-factor levels, 829-833 
model II, 1047—1049 
model III, 1049—1052 
pooling sums of squares, 861—862 
repeated measures design, 
1153-1161. 
residual analysis, 842—843 
strategy for analysis, 847—858 
Tukey test for additivity, 886-888 
two-level factorial design, 1210-1212 
two-level fractional factorial design, 
1223—1239 


unbalanced nested design, 1104-1106.. | 


unequal sample sizes, 951-964 
Analysis of variance table, 67-68, 120, 
124—127, 225, 261, 262, 604, 
840-842 
ANOVA (see Analysis of variance) 
ANOVA models (See Analysis of variance 
models) 
ANOVA table (see Analysis of variance 
table) 
Antagonistic interaction effect, 308 
A-optimality, 1282 
Apex Enterprises, 1031-1034 
Arc sine transformation, 790 
Asymptotic normality, 50 
Autocorrelation, 481—501 
Durbin-Watson test, 487-490 
parameters, 484 
remedial measures, 490—498 
Autocorrelation function, 485 
Autocovariance function, 485 
Automatic mode] selection, 582—585 
Automatic search procedures, 361—369 
Autoregressive error model (see 
First-order autoregressive error 
model) 
Axial points, 1269 


B 


Back-propagation, 543 
Backward elimination selection 
procedure, 368 
Balanced incomplete block designs 
(BIBDs), 664—665, 1173—1183 
advantages and disadvantages of, 
1175-1176 
analysis of, 1177—1183 
table, 1345—1347 


Balanced nested design, 1088—1091 
Bar graphs, 736, 853 
Bar-interval graph, 738—739 
Baseline (referent) category, 611 
Bates, D. M., 529 
Bechhofer procedure, 721 
Berkson, J., 167—168, 172 
Berkson model, 167—168 
Bernoulli random variables, 563 
“Best” subsets algorithms, 361—364 
"Best" treatment, 721—722, 864 
Beta coefficients, 278 
Bias correction, 940 
Biased estimation, 369, 432-433 
BIBDs (See Balanced incomplete block 
designs) 
Bilinear interaction term, 306 
Binary variables, 315 
outcome variables, 556-557 
response variables, 555—563 (See also 
Indicator variables) 
Binomial distribution, 569 
Bisquare weight function, 439-440 
Bivariate normal distribution, 78-80 
Blind studies, 658 
Blocked experiments, 656 
Blocking, 656-658 
and cause-and-effect inferences, 895 
covariance analysis alternative, 939 
criteria for, 893-894 
Blocks, 656, 892 
BMDP, 107, 694, 695 
BMDP3V, 1075 
Bonferroni inequality, 155—156, 1215 
Bonferroni joint estimation procedure, 
287, 396-398 
analysis of covariance, 930, 931, 934 
confidence coefficient, 155-157 
latin square design, 1190 
logistic regression, 580 
mean responses, 159, 230 
multiple pairwise testing, 1038-1039 
nested design, 1101, 1102 
nonlinear regression, 532 
prediction of new observations, 
160—161, 231 
randomized block design, 904 
regression Coefficients, 228 
repeated measures design, 1157 
single-factor ANOVA model, 
756-758 
three-factor analysis of variance, 1015, 
1017, 1074 
two-factor analysis of variance, 851, 
852, 856-857 
Bootstrapping, 458-464, 529, 530, 
1075-1076 
Bounded influence regression 
methods, 449 
Box, G. E. P., 236 
Box, M. J., 529 


Box-Behnken designs, 1276 

Box-Cox transformation, 134-137, 142, 
236, 791-793 

Box plots, 102, 108, 110, 779, 781 

Breiman, L., 369 

Breusch-Pagan test, 115, 118-119, 
142-144, 234-235 

Brown-Forsythe test, 115, 116—118, 234 

Brushing, 233 

Bubble (proportional influence) plot, 600 


[o 


Calibration problem, 170 
Caliper (interval) matching, 669 
Candidate list, 1278 
Carryover effect, 1128, 1201 
Case, 4 
Case-control studies (See Retrospective 
studies) 
Castle Bakery Company, 833—838 
Causality, 8-9 
CDI data set, 1349-1350 
Cell means model, 681—682, 704, 
710—712, 830-831, 834, 996-997 
Centering, 272 
Center point, 1222, 1269 
Center point replications, 1222-1223 
Central composite designs, 1268—1276 
Central limit theorem, 1302 
Centroid, 398 
Changeover design, 1198 
Cheng, C., 464 
Cleveland, W S., 138, 146, 450 
Close-to-linear nonlinear regression 
estimates, 529 
Cochrane-Orcutt procedure, 492-495 
Cochran’s theorem, 69-70, 699, 843 
Coefficient(s): 
of correlation, 76, 80 
of determination, 74—76, 86-87 
of multiple correlation, 227 
of multiple determination, 226-227 
of partial correlation, 270-271 
of partial determination, 268-271 
of simple correlation, 227 (See also 
Multiple regression 
coefficients; Simple linear 
regression coefficients) 
Cohort studies, 667 
Column sum of squares, 1189 
Column vector, 178 
Comparative experjmental 
studies, 643, 644 
Comparative observational 
studies, 644—645 
Complementary events, 1298 
Complementary log-log transformation, 
562-563, 568 
Complete block design, 1183 


Completely randomized design, 13, 644, 
659—660 
Completely randomized factorial 
design, 660 
Complete replicates, 653 
Components of variance model, 1033 
Compound symmetry, 1062 
Concomitant variables, 919-920 
Concordance index, 607 
Conditional effects plot, 307, 1285 
Conditional probability distributions, 
80-83 
Confidence band, for regression 
line, 61-63 
Confidence coefficient, 46, 49-50, 54—55, 
744—745 
Bonferroni procedure, 155—157 
family, 154—155 
and risk of errors, 50 
Confidence intervals, bootstrap, 460 
Confirmatory experiments, 1209 
Confounding, 1224—1227 
worst-case degree of, 1232 
Confounding factors, 656 
Confounding scheme, 1226 
Conjugate gradient method, 543 
Consistent estimator, 1305 
Constancy of error variance, tests for, 
115-119, 234 
Constrained randomization, 655—658 
Contour diagram, 1284—1286 
Contrast, 741-742 
Control factors, 1251 
Control group, 643 
Controlled experiments, 343—344, 347 
Control treatment, 651-652 
Control variables, 344, 919 
Cook's distance, 402-405, 598—601 
Corner points, 1269 
Correlation coefficients, 78—89, 1301 
and bivariate normal distribution, 
78-80 
and conditional inferences, 80—83 
inferences on, 83-87 
and regression vs. correlation 
models, 78 
Spearman rank, 87-88 
table of critical values, 1329 
table of z’ transformation, 1332 
Correlation matrix of transformed 
variables, 274—275 
Correlation models: 
bivariate normal distribution, 78-80 
compared to regression models, 78 
conditional probability distributions, 
80-83 
multivariate normal distribution, 
196-197 
regression analysis, 82-83 (See also 
Regression models) 
Correlation operator, 1301 


Correlation test for normality, 
115, 234 
Correlation transformation, 272-273 
Covariance, 1300-1301 (See also 
Variance-covariance matrix) 
Covariance analysis (see Analysis of 
covariance) 
Covariance models, 329 (See also 
Analysis of covariance) 
Covariance operator, 1300 
Covariates, 919 
Cox, D. R., 171, 172 
C, criterion, Mallows’, 357-359 
Crossed factor, 648, 1088—1091 
Crossed-nested design, 662-663, 1114 
Crossed two-factor study, 1366—1369 
Crossover designs, 1198-1200 
Cross-sectional studies, 666—667 
Cross-validation, 372 
Cumulative logits, 616 
Curvilinear relationship, 4 
Cutoff point, 604 


D 


Data collection, 343—346, 370-371 
Data sets, 1348-1357 
Data snooping, 745 
Data splitting, 372 
Decision rule, 70 
Defining relation, 1227-1228 
Degree of linear association, 74—77 
Degrees of freedom, 66, 693, 839, 1009, 
1095, 1303-1304. 
rule for finding, 1359—1361 
Deleted residuals, 395—396 
Delta chi-square statistic, 598 
Delta deviance statistic, 598 
Density function, 1073 
of normal random variable, 1302 
Dependent variables, 2-3, 3-4 
Derived predictor values, 537 
Design criterion, 1278 
Design generators, 1229 
Design matrix, 1212 
Design of experiments, 647 
Design resolution, 1231-1232 
Determinant criterion, 1279—1280 
Determinant of matrix, 190 
Deviance, 589 
Deviance goodness of fit test, 588—590 
DFBETAS, 404—405 
DFFITS, 401-402 
Diagnostic plots, 901—903 
for predictor variables, 100—102 
for residuals, 103-114 
Diagnostics (see Logistic regression 
diagnostics) 
Diagonal matrix, 185-186 


Index 13B7 


Dichotomous responses, 556 (See also 
Binary variables) 

Differences of treatment means, 1002 

Direct numerical search, 518—524 

Discriminant analysis, 608 

Disease outbreak data set, 1355 

Disordinal interaction, 326 

Dispersion model, 1246-1247 

Disturbances, 482 

Dixon, W. J., 33 

D-optimal design, 1279 

Dorle Exterior Trim, 1288—1289 

Dose-response relationships, 510 

Dot plots, 100—101, 108, 110, 777, 
778, 781 

of residuals, 778 

Double-blind study, 658 

Double crossover design, 1201 

Double cross-validation procedure, 375 

Drug effect experiment data set, 
1356-1357 

Dummy variables, 12, 315 (See also 
Indicator variables) 

Durbin-Watson test, 114, 487-490, 492 

table of test bounds, 1330-1331 


E 


Educational Testing Service, 441 
Efron, B., 459 
Eigenvalues, 1287 
Empty cells, 964—967 
Error mean square, 66, 126 
Error sum of squares, 25, 64, 72, 126, 691, 
836-837 
Error terms, 9-12, 778—779, 794—795 
autocorrelated, 481—484 
constancy of variance tests, 116-119 
nonindependence of, 108—110, 128 
nonnormality, 110-112, 128-129 
Error term variance, 24—26, 527-528 
Error variability reduction, 917-918 
Error variance, 107, 128, 421-431, 778, 
793-794 
Estimates (see Tests) 
Estimation, 737—738 
Estimation approach to sample size 
planning, 759-761, 863—864, 
1182-1183 
Estimators, 1305-1306 
Exact F test, 1067—1068 
Exchange algorithms, 1282 
Expectation operator, 1299 
Expected mean squares, 68-69, 604—608, 
840, 1052-1053 
rule for finding, 1361—1365 
Expected value, of random variable, 1299 
Experimental data, 13 
factor level means, 684 


1388 Index 


Experimental designs, 647 
blocking, 656—658 
completely randomized design, 13, 
659—660 
Crossed-nested, 662-663 
crossover, 1198-1200 
double crossover, 1201 
exploratory, 1209 
factorial experiment, 660—661 
fractional factorial design, 665-666 
latin square, 1183-1186 
nested, 662—663, 1088—1091 
one-factor-at-a-time approach, 
815, 816 
randomization tests, 712—715 
randomized block, 892-896 
randomized complete block, 661—662 
repeated measures, 663—664, 
1127-1129 
response sürface design, 666 
response surface methodology, 
1267—1268 
screening designs, 1239-1240 
sequential search for optimal 
conditions, 1290—1292. 
split-plot design, 664, 1162-1163 
two-level factorial design, 665-666, 
1210-1212 
two-level fractional factorial design, 
1223-1239 
Experimental error sums of squares, 
1108-1109 
Experimental factors, 644, 647 
Experimental group, 643 
Experimental studies, 643—644 
mixed observational and, 646-647 
observational vs., 677-679 
Experimental units, 13, 643, 652, 
893-894, 1112 
Experiments, 643 
Explanatory variables, 3, 347-349 
omission of, 780-781 
Exponential family of probability 
distributions, 623 
Exponential regression function, 128 
Exponential regression model, 511-512 
Ex post facto studies (See Retrospective 
studies) 
Externally studentized residuals, 396 
Extra sums of squares, 256-262, 285—286 
Extreme value (Gumbel density 
function), 562 


F 


Face-centered design, 1273 
Factor A main effects, 844 
Factor A sum of squares, 838 
Factor B main effects, 845 


Factor B sum of squares, 838 
Factor effects model, 701—704, 831—833, 
835—836, 997—998 
with unweighted mean, 705—708 
with weighted mean, 709—710 
Factorial experiments, 660—661 
Factor level, 647-648 
Factor level means, 684, 698—701, 
704, 818 
estimation, 848—853 
estimation and testing, 737—761 
plots of, 735-737 
line plot, 735-736 
main effects plot, 736-737 
weighted least squares estimation, 
786-789 
Factors, 344, 647 
and choice of treatment, 649-652 
crossed, 648 
nested, 649 
nuisance (confounding), 656 
(See also Control variables) 
Family: 
of conclusions, 1013 
of estimates, 154—155 
of tests, 745, 846, 1010 
Family confidence coefficient, 154—155 
Far-from-linear nonlinear regression 
estimates, 529 
F distribution, 1304 
Scheffé procedure, 160-161 
table of percentiles, 1320-1326 
First differences procedure, 496-498 
First-order autoregressive error model, 
484—487 
Cochrane-Orcutt procedure, 492—495 
Durbin-Watson test, 487-490 
first differences procedure, 496—498 
forecasting with, 499—501 
Hildreth-Lu procedure, 495-496 
First-order interactions, 995 
First-order regression model, 9, 215—217, 
318—319 (See also Regression 
models) 
Fish-bone diagrams, 649 
Fisher Company, 1272-1274 
Fisher z transformation, 85 
Fitted logit response function, 565 
Fitted values, 202—203, 688, 835, 1004 
influences on, 401-404 
and multicollinearity, 286-288 
and residuals, 224-225 
total mean square error, 357—359 
Fitting, 208-299 
of ANOVA model, 685—689, 
1003-1011 
Fixed effects contrasts, 1058 
Fixed effects model, 685 
Fixed X sampling, 459 
Folding over, 1240 
Foldover design, 1240 


Forecasting with autoregressive error 
model, 499—501 
Forward selection procedure, 368 
Forward stepwise regression, 364—367 
Fraction, 1209 
Fractional factorial designs, 
665-666, 1209 
Frequencies, proportional, 980 
Friedman test, 900-901, 1138 
F test: 
for analysis of variance, 69—71 
equivalence of t test, 71 
for lack of fit, 119—127, 235 
nonparametric rank, 795—798 
for regression relation, 226 
Full model, 72, 121-123, 700, 711-712 
Functional relation, 2-3 


G 


Galton, Francis, 5 
Gauss-Markov theorem, 18, 43, 884 
Gauss-Newton method, 518—524 
Generalized interaction, 1230 
Generalized least squares, 430 
Generalized randomized block design, 
906-908 
Generalized randomized block model, 907 
General linear regression models, 
217-221, 510-511, 623-624 
General linear test, 72—73, 121—127 
approach, 972-974 
Goodness of fit tests, 586—590 
deviance, 588—590 
Hosmer-Lemeshow, 589—590 
Pearson chi-square, 586—588, 590 
G-optimality, 1282 
Gulledge, T. R., Jr., 115, 146, 1329 
Gumbel density function (extreme 
value), 562 


H 


Half-fraction design, 1229 
Half-normal probability plot, 

595—598, 1222 
Hartley test, 782—784, 1144 
Hat matrix, 202—203, 392—394, 398—400 
Heating equipment data set, 1353—1354 
Hessian matrix, 578, 1074 
Heteroscedasticity, 429 
Hidden nodes, 540 
Hidden replication, 816 
Hierarchical fitting, 298-299 
Hildreth-Lu procedure, 495-496 
Histograms, 110, 778, 781 
Holm simultaneous testing procedure, 850 
Homoscedasticity, 429 


Honestly significant difference tests, 752 
Hosmer-Lemeshow goodness of fit test, 

589—590 
Hougaard, P., 529 
H statistic, 782 

table of percentiles, 1336 

Huber weight function, 439—440 
Hyperplane, 217 


I 


IC Technologies, 1282-1283 
Idempotent matrix, 203 
Identity matrix, 186 
Important interactions, 824—825, 1016 
Incomplete block designs, 664—665, 1183 
two-level factorial, 1240-1244 
Independent random variables, 1302 
Independent samples, 1309-1311 
Independent variables, 2-3, 3-4 
Index of response, 939 
Indicator variables, 314—315 
allocated codes vs., 321—322 
alternative codings, 323—324 
in analysis of variance, 680-681 
for comparing regression functions, 
329-335 
interaction effects, 324—327 
quantitative variables vs., 322—323 
time series applications, 319—321 
Individual outcome, 56 
Individual test, 745 
Influential cases, 400—406 
Influential observations, detection of, 
598-601 
Instrumental variables, 167 
Interaction effect coefficient, 297 
Interaction effects, 220 
with indicator variables, 324—327 
interference/antagonistic, 308 
reinforcement type, 308 
Interaction model for random block 
effects, 1064—1065 
Interaction regression models, 306-313 
Interactions, 823 
in analysis of variance, 822-829 
generalized, 1230 
multiple two-factor, 999-1000 
single two-factor, 1000-1002, 
tests for, 844 
three-factor, 996, 998-999, 1016 
two-factor, 856—861 
two-level factorial design, 1218-1219 
Interaction sum of squares, 838 
Interaction sum of squares between blocks 
and treatments, 898 
Interaction sum of squares between 
treatments and subjects, 1130 
Intercorrelation (see Multicollinearity) 


Interference interaction effect, 308 

Internally studentized residuals, 394 

Interval (caliper) matching, 669 

Interval plot, 738 

Intraclass correlation coefficient, 1035 

Intrinsically linear response functions, 514 

Inverse of matrix, 189-193 

Inverse predictions, 168—170 

Inverse regression, 170 

Iowa Aluminum Corporation, 1233-1239 

IPO data set, 1355-1356 

IRLS robust regression, 439—441 

Irregular experimental regions, 1276-1277 

Ischemic heart disease data set, 
1354—1355 

Ishakawa diagrams, 649 

Iteratively reweighted least squares, 426 


JMP, 981 

J-1 nominal response logits, 610-614 
Joint density function, 1073 

Joint estimation, 154—157 

Joint probability function, 1300 


K 


Kendall’s coefficient of concordance, 1139 
Kendall’s z, 89 

Kenton Food Company, 694—695, 712 
Kimball inequality, 846, 1010, 1011, 1215 
Kolmogorov-Smirnov test, 115 
Kruskal-Wallis rank test, 796-797 
Kurtosis, 793 


L 


Lack of fit mean square, 124 
Lack of fit test, 119—127, 235, 764—766, 
1222-1223 
LAD (least absolute deviations) 
regression, 438 
Large-sample theory, 528—530 
LAR (least absolute residuals) 
regression, 438 
Latent explanatory variables, 348 
Latin square changeover design, 1198 
Latin square design, 1183—1186 
ANOVA partitioning, 1188—1189 
crossover design, 1198—1200 
double crossover design, 1201 
efficiency, [193-1194 
estimation of effects, 1190 
factorial treatment, 1192 
fitting of model, 1188 


Index 1389 


F test, 1190 
model, 1187 
notation, 1188 
planning of sample sizes, 1193-1194 
random blocking effects, 1193 
randomization, 1185—1186 
repeated measures, 1198-1201 
replications, 1193 
replications within cells, 1195-1196 
residual analysis, 1191 
rule modification, 1366 
sums of squares, 1188—1189 
table, 1344 
Tukey test for additivity, 1191-1192 
use of independent squares, 
1200-1201 
use of several squares, 1196-1198 
Learning curve models, 533-537 
Least absolute deviations (LAD) 
regression, 438 
Least squares estimation, 161—162, 
1305—1306 
criterion, 15—19 
generalized, 430 
and maximum likelihood estimation, 
32-33 
multiple regression, 223-224 
penalized, 436 
randomized complete block 
model, 898 
simple linear regression, 199-201 
single-factor ANOVA model, 
687-689 
standardized regression coefficients, 
275-278 
three-factor analysis of variance, 
1003-1005 
two-factor analysis of variance, 
834—836, 975-976 
weighted, 421-431 
Levene test (see Modified Levene test) 
Leverage, 398 
Likelihood function, 29—33, 564, 1305 
Likelihood ratio test, 580-582 
Likelihood value, 28 
Lilliefors test, 115 
Linear-by-linear interaction term, 306 
Linear combination of factor level 
means, 744 
Linear dependence, 188 
Linear effect coefficient, 296 
Linear function of normal random 
variable, 1302 
Linear independence, 188 
Linearity, test for, 119-127 
Linearization method (see Gauss-Newton 
method) 
Linear model, 221 
ANOVA model, 683—684 
Linear predictor, 560, 623 
Linear regression functions, 7 


1390 Index 


Line plot of estimated factor level means, 
735-736 
Link function, 623-624 
LMS (least median of squares) 
regression, 439 
Locally weighted regression scatter plot 
smoothing, 138-139 
Location model, 1247, 1250 
Logistic mean response function, 560—562 
Logistic regression, polytomous, 608-618 
Logistic regression diagnostics, 591—601 
influential observations, detection of, 
598-601 
plots, residual, 594-598 
residuals, 591—594 
Logistic regression models, 512—513 (See 
also Regression models) 
Logit response function, 562 
Logit transformation, 562 
Looney, S. W., 115, 146, 1329 
Lowess method, 138—139, 449—450 


M 


Main effects, 818—819, 1012 
Main effects plot of estimated factor level 
means, 736—737 
Mallows' C, criterion, 357-359 
Marginal probability function, 1300 
Market share data set, 1350 
Marquardt algorithm, 525 
Matched-pairs design, 669 
Matched studies, 668-669 
Matching, 668-669 
Mathematics proficiency, 441—448 
Matrix(-ces): 
addition, 180—181 
with all elements unity, 187 
basic theorems, 193 
definition, 176—178 
determinant, 190 
diagonal, 185—186 
dimension, 176-177 
elements, 176-177 
equality of two, 179—180 
hat, 202—203, 392—394, 398-400 
Hessian, 578 
idempotent, 203 
identity, 186 
inverse, 189-193 
linear dependence, 188 
multiplication by matrix, 182-185 
multiplication by scalar, 182 
nonsingular, 190 
of quadratic form, 205—206 
random, 193—196 
rank, 188-189 
scalar, 187 
scatter plot, 232-233 


simple linear regression model, 
197-199 
singular, 190 
square, 178 
subtraction, 180—181 
symmetric, 185 
transpose, 178-179 
vector, 178 
zero vector, 187 
Maximum likelihood estimation, 27-33, 
612-614, 617, 1305 
logistic regression, 564—567 
mixed ANOVA models, 1072-1076 
Poisson regression model, 620 
single-factor ANOVA model, 
687-689 
Mean: 
of the distribution, 56 
prediction of, 60—61 
of residuals, 102 
Mean response: 
interval estimation; 52 
logistic regression 
interval estimation, 602—603 
point estimation, 602-604 
multiple regression 
estimation, 229-232 
joint estimation, 230 
simple linear regression 
interval estimation, 52-55, 
157—159, 208-209 
joint estimation, 157—159 
point estimation, 21-22 
Mean squared error: 
of regression coefficient, 433 
total, of n fitted values, 357-359 
Mean square prediction error, 370-371 
Mean squares, 25, 66-67, 693-694, 
839-840, 1009 
analysis of variance, 225 
expected, 68-69 
Measurement errors in observation, 
165—168 
Median absolute deviation (MAD), 
440-441 
Method of steepest descent, 525 
Minimax, 1285 
Minimum L; -norm regression, 438 
Minimum variance estimator, 1305 
MINITAB, 20-21, 46, 47, 49—50, 101, 
104, 671, 777, 840, 981, 1117, 
1249 
MINITAB Fractional Factorial procedure, 
1235-1238 
Minnesota Department of Transportation, 
464-471 
Mixed experimental and observational 
studies, 646-647 
Mixed factor effects model, 1049—1052 
MLS procedure, 1045—1047 
Model-building set, 372 


Modified large sample procedure (see 
MLS procedure) 

Modified Levene test, 115, 116—118, 
784—785, 1144 

Modified Levene test statistic, 234 

Moving average method, 137 

MSE, criterion, 355—356 

Multicategory logistic regression models 

(see Polytomous logistic 
regression) 

Multicollinearity, 278-289 
detection of, 406-410 
remedial measures, 431—437 
ridge regression, 431-437 

Multifactor covariance analysis, 934-937 

Multifactor studies, 648 
sample size planning, 1021-1022 
unequal sample sizes, 1019-1021 

Multiple comparison procedures, 

746-759, 1059 

Multiple logistic regression, 570-577 
geometric interpretation, 572-573 
model, 570-573 
polynomial logistic regression, 

575—576 
prediction of new observation, 
604—608 
Multiple pairwise comparisons, 797—798, 
850—851, 856-861 
Multiple pairwise testing procedure, 
1138-1139 
Multiple regression (see Mean response; 
Prediction of new observation; 
Regression coefficients; 
Regression function) 

Multiple regression coefficients, 216-217 
danger in simultaneous tests, 287—288 
interval estimation, 228, 229 
joint inferences, 228 
least squares estimation, 223-224 
tests concerning, 228, 263—268 
variance-covariance matrix of, 

221—228 

Multiple regression mean response: 
estimation, 229—232 
joint estimation, 230 

Multiple regression models, 214—221 
ANOVA table, 225 
diagnostics, 232-236 
extra sum of squares, 256-262 
general model, 217—221 
interaction effects, 220 
logistic regression, 570-577 
lowess method, 449-459 
in matrix terms, 222—223 
multicollinearity effects, 278-289 
remedial measures, 236 
standardized, 271—278 
two predictor variables, 236-248 

Multiplication theorem, 1298 

Multivariate normal distribution, 196-197 


М 


Nested design, 662—663, 1088—1091 
balanced, 1088—1091 
residual analysis, 1100 
rule for model development, 
1091-1092 
subsampling, 1106-1114 
three-factor partially nested, 
1114-1119 
two-factor 
ANOVA partitioning, 1093-1099 
estimation of effects, 1100-1104 
fitting of model, 1093 
F test, 1097—1099 
model, 1091—1092 
residual analysis, 1100 
unbalanced, 1104—1106 
Nested factor, 649, 1088—1091 
Neural networks, 537—547 
conditional effects plots, 546 
example illustrating, 543—546 
as generalization of linear 
regression, 541 
network representation, 540—541 
and penalized least squares, 542—543 
single-hidden-layer, feedforward, 537 
training the network, 542 
No-interaction model, 880-886 
Noise factors, 1246, 1250-1252 
Noncentral F distribution, 70, 699 
Noncentrality measure, 51 
Noncentrality parameter, 717 
Nonconstant error variance, 557-558, 778 
Nonindependence of error terms, 
778—779, 194—195 
Nonindependent residuals, 102—103 
Nonlinear regression models, 511—512 
transformations for, 129-132 (See also 
Regression models) 
Nonnormal error terms, 557 
Nonnormality, 793-794 
of error terms, 781 
transformations for, 132-134 
Nonparametric rank F test, 795—798, 
900-901, 1138-1139 
Nonparametric regression, 449-458 
Nonparametric regression curves, 137 
Nonsingular matrix, 190 
Nonstandard models, 1277 
Nonstandard sample sizes, 1278 
Nontransformable interactions, 826—827 
Normal equations, 17-18, 271-272, 
517-518 
Normal error regression model, 
26-33, 82 
confidence band for regression line, 
61-63 
inferences concerning fio, 48—51 
sampling distribution of b;, 41-46 
X and Y random, 78-89 


Normality: 
assessing, 112 
correlation test for, 115 
tests for, 115 
Normal population: 
one population mean, 1306—1308 
population variance, 1311—1312 
two population means, 1309—1311 
two population variances, 1312-1314 
Normal probability distribution, 
1302-1303 
Normal probability plot, 110-112, 
781, 1221 
of residuals, 778 
Nuisance factors, 656 
Numerator degrees of freedom, 1304 


о 


Observational data, 12-13 
Observational factors, 645, 647 
Observational studies, 344—345, 347—349, 
368—369, 644—646 
Cross-sectional studies, 666—667 
design of, 666 
experimental vs., 677-679 
factor level means, 684 
mixed experimental and, 646—647 
prospective (cohort) studies, 667 
retrospective studies, 667-668 
Observation units, 1112 
Observed value, 21 
Odds ratio, 562, 567 
One-factor-at-a-time (OFA AT) approach 
to experimentation, 815, 816 
One-sided test, 47-48 
Optimal response surface design, 
1276-1283 
Optimum conditions, 1290-1292 
Order effect, 1128 
Order position sum of squares, 1199 
Ordinal interaction, 326 
Orthogonal coding, 1214 
Orthogonal decomposition, 838 
Orthogonality, 1273, 1275 
Orthogonally blocked design, 1276 
Orthogonal polynomials, 305 
Ott, E. R., 758 
Outliers: 
detection of, 779-780 
observations, outlying, 108—109, 129 
tests for, 115, 396-398 
Outlying cases, 390-391, 437—438 
Overall F test, 264, 266 


P 


Paired-comparison design, 669 
Paired comparison plot, 748 


Index 1391 


Paired observations, 1311 
Pair-wise comparison, 739, 746, 797—798, 
850-851, 856-861, 962—964, 1182 
Pareto plot, 1219-1220 
Partial F test, 264, 267—268 
Partially hierarchical nested design, 1114 
Partially nested design, 1114 
Partial regression coefficients, 216 
Partial regression plots (see 
Added-variable plots) 
Partitioning: 
degrees of freedom, 66 
sum of squares total, 63-66 
Path of steepest ascent/descent, 1290 
Pattern sum of squares, 1199 
Pearson chi-square goodness of fit test, 
586—588, 590 
Pearson product-moment correlation 
coefficient, 84, 87 
Pearson semistudentized residuals, 591 
Pearson studentized residuals, 592 
Pecos Foods Corporation, 1216-1222 
Penalized least squares, 436, 542—543 
Penalty weight, 541 
Permutation test, 714. 
Plackett-Burman designs, 1240 
Plots of estimated factor level means, 
735-131 
line plot, 735-736 
main effects plot, 736-737 
Plots of residuals against fitted values, 
778, 779 
Plutonium measurement, 141-144 
Point clouds, 233 
Point estimators, 17, 21-22, 24—26, 602, 
1056-1057 
Poisson regression model, 618-623 
Polynomial logistic regression, 575-576 
Polynomial regression model, 219-220, 
294—305 
Polytomous (multicategory) logistic 
regression: 
for nominal response, 608-614 
for ordinal response, 614-618 
Pooling sums of squares, 861—862 
Population of consumers, 970 
Power approach to sample planning, 
716—723, 862-863 
Power of tests: 
latin square design, 1193 
planning of sample size, 716-717 
randomized block design, 909—910 
regression coefficients, 50-51, 228 
Power transformations, 134, 135 
Prediction error rate, 607—608 
Prediction interval, 57-60 
Prediction of mean, 60-61 
Prediction of new observation: 
logistic regression, 604—608 
multiple regression, 231 
simple linear regression, 209 


1392 Index 


Prediction of new observation (cont.) 


simple linear regression model, 
55-61 


Prediction set, 372 
Predictor variables, 3-4, 6—7 


added-variable plots, 384—390 
in analysis of variance, 679-681 
diagnostics, 100-102 
measurement errors, 165—168 
multicollinearity, 278—289 
multiple regression, 214—217, 
236-248 
omission of important, 129 
polynomial regression, 2905-298 
qualitative, 218-219, 313-318 


Quarter-fraction design, 1229—1231 
Quasi F test statistic, 1068—1069 


R 


R?a criterion, 355-356 

R? p criterion, 354-356 

Radial basis function, 541 

Random ANOVA model, 1031 
Random cell means model, 1031—1034 
Random effects model, 685 

Random factor effects model, 1047 
Randomization, 643, 652-658, 895, 


residuals and omission of, 112-114 
PRESS, Criterion, 360-361 
Primary variables, 344 
Principal components regression, 432 
Probability distribution, 50, 52-53 

in single-factor ANOVA model, 681 
Probability theorems, 1298 
Probit mean response function, 559-560 
Probit response function, 560, 568 
Product operator, 1298 
Projection property, 1232 
Proportional frequencies, 980 
Proportional influence (bubble) plot, 600 
Proportionality constant, 424 
Proportional odds model, 615, 616 
Prospective studies, 667 
Prostate cancer data set, 1351-1352 
Pseudo F test statistic, 1068-1069 
Pure error estimate, 1222 
Pure error mean square, 124 
Pure error sum of squares, 122 
Puri, M. L., 798 


Q 


Quadratic effect coefficient, 296 

Quadratic forms, 205—206 

Quadratic linear regression model, 
220-221 

Quadratic regression function, 7, 128 

Quadratic response function, 295, 305, 
764-766 

Qualitative factor, 647 

Qualitative predictor variables, 218-219, 


1128-1129 
constrained, 655—658 
latin square design, 1185-1186 
restricted, 656 


Randomization distribution, 713—714 
Randomization tests, 712—715 
Randomized complete block design, 


661—662, 892-896 

analysis of covariance, 937-938 

ANOVA partitioning, 898—900, 
908-909 

appropriateness of model, 901—903 

diagnostic plots, 901-903 

estimation of effects, 904—905 

factorial treatments, 908-909 

fitting of model, 898 

F test, 898-900, 900-901, 1138-1139 

generalized design, 906-908 

missing observations, 967-969 

models, 897-898, 1061—1065 

multiple pairwise testing procedure, 
1138-1139 

nonparametric rank F test, 900-901, 
1138-1139 

planning of sample sizes, 
909-912, 939 

random block effects, 1060-1065 

rank data, 900-901, 1138-1139 

regression approach, 938, 967—969 

residual analysis, 901—903 

subsampling, 1367, 1369-1370 

Tukey test for additivity, 903-904 

use of more than one blocking 
variable, 905-906 

use of more than one replicate in each 
block, 906—908 


313—318 
in analysis of variance, 679-681 
more than one variable, 328 
with more than two classes, 318—320 
with two Classes, 314—318 
variables only, 329 


Quantitative factors, 647, 762—766 
Quantitative predictor variables, in 


analysis of variance, 679, 681 


Quantitative variables, 322-323 


Random matrix, 193—196 

Randomness tests, 114 

Random variables, 1299-1302 

Random vector, 193-196 

Random X sampling, 459 

Rank correlation procedure, 87 

Rank of matrix, 188-189 

Real estate sales data set, 1353 

Receiver operating characteristic (ROC) 
curve, 606 


Reduced model, 700, 711—712 
Reduced model general linear test, 123 
Reduced model general test, 72 
Referent (baseline) category, 611 
Reflection method, 460 

Regression: 


and causality, 8-9 
as term, 5 


Regression analysis, 2 


analysis of variance approach, 63—71 

approach to balanced incomplete 
block designs, 1177-1179 

approach to single-factor ANOVA 
model, 704—712 

compared to analysis of variance, 
679—681 

completely randomized design, 13 

computer calculations, 9 

considerations in applying, 77—78 

experimental data, 13 

inferences concerning f, 40-48 

observational data, 12-13 

overview of steps, 13—15 

transformations of variables, 129—137 

uses, 8 


Regression approach: 


to analysis of covariance, 924—925, 
934—935 

to analysis of variance models 
three-factor, 1019—1020 

to randomized block design, 938, 
967-969 

to repeated measures design, 1161 

two-factor, 953—959 


Regression coefficients, 404—405 


bootstrapping, 458—464. 

effects of multicollinearity, 284—285 

interaction regression, 306-309 

interpretation for predictor variables, 
315—318, 324—327 

lack of comparability, 272 

multiple regression (see Multiple 
regression coefficients) 

partial, 216 

simple linear regression (see Simple 
linear regression coefficients) 

standardized, 275-278 


Regression curve, 6 
Regression function, 6 


comparison of two or more, 
329-330 
constraints, 558—559 
estimation, 15-24 
exploration of shape, 137—144 
exponential, 128 
hyperplane, 217 
interaction models, 309 
intrinsically linear, 514 
nonlinearity, 104—107, 128 
through origin, 161—165 
outlying cases, 390-391 


polynomial regression, 299—300 
quadratic, 128 
test for fit, 235 
test for lack of fit, 119-127 
test for regression relation, 226 
Regression line, 61—63 
Regression mean square, 66 
Regression models: 
autocorrelation problems, 481—484 
basic concepts, 5-7 
with binary response variable, 
555—559 
bootstrapping, 458-464 
building, 343—350, 368—369 
building process diagnostics, 
384-414 
choice of levels, 170-171 
coefficient of correlation, 76 
coefficient of determination, 74 
compared to correlation models, 78 
construction of, 7—8 
degree of linear association, 74—77 
effect of measurement errors, 
165—168 
estimation of error terms, 24—26 
first-order autoregressive, 484—487 
general linear, 121—127, 217—223, 
510—511, 623-624 
interaction, 306—313 
inverse predictions, 168-170 
logistic regression 
mean response, 602-604 
multiple, 570-577 
parameters, 577—582 
polytomous, 608-618 
simple, 564—570 
tests for, 586-601 
multiple regression (see Multiple 
regression models) 
nonlinear regression 
building, 526-527 
Gauss-Newton method, 518—524 
learning curve, 533—537 
least squares estimation, 515—525 
logistic, 563-618 
parameters, 527—533 
Poisson, 618-623 
normal error, X and Y random, 78-89 
normal error terms, 26—33 
origin, 5 
overview of remedial measures, 
127-129 
Poisson regression, 618—623 
polynomial, 219—220, 294—305 
with qualitative predictor variables, 
313-321 
residual analysis, 102—115 
scope of, 8 
selection and validation, 343—375 
automatic search procedures, 
361—369 


backward elimination, 368 
criteria for model selection, 
353-361 
forward selection, 368 
forward stepwise regression, 
364—367 
simple linear (see Simple linear 
regression models) 
smoothing methods, 137—141 
third order, 296 
transformed variables, 220 
validation of, 350, 369-375 
Regression relation, functional form, 7—8 
Regression sum of squares, 65, 260—262 
Regression surface, 216, 229-230 
Regression through origin, 161—165 
Regression trees, 453-457 
Reinforcement interaction effect, 308 
Remainder sum of squares, 1366 
Repeated measures design, 663—664, 894, 
1127-1129 
blocking of subjects in, 1153 
estimation of effects, 1137—1138, 
1145, 1157—1158 
F test, 1130-1134, 1138-1139, 
1142-1143, 1155-1157 
latin square crossover design, 
1198-1200 
multiple pairwise testing procedure, 
1138-1139 
ranked data, 1138—1139 
regression approach, 1161 
repeated measures on both factors, 
1153-1161 
repeated measures on one factor, 
1140-1153 
residual analysis, 1134—1135, 
1144, 1157 
single-factor, 1129-1139 
split-plot design, 1162-1163 
Replicates, 120 
Replication, 120, 426, 653 
hidden, 816 
Reproducibility, 653 
Residuals, 203—204, 224-225 
deleted, 395—396 
departures from simple linear 
regression, 103 
logistic regression diagnostics, 
591—598 
and omitted predictor variables, 
112-114 
outliers, 108—109 
Overview of tests, 114—115 
properties, 102-103 
in regression models, 22-24 
scaled, 440—441 
semistudentized, 103, 392, 591 
studentized, 394 
studentized deleted, 396-398 
studentized Pearson, 592 


Index 1393 


variance-covariance matrix of, 
203—204 
Residual analysis, 102 
analysis of variance, 775—781, 
842—843, 1006, 1007 
latin square design, 1191 
nested design, 1100 
randomized block design, 901—903 
repeated measures design, 1134—1135, 
1144, 1157 
Residual dot plots, 779 
Residual mean square, 25 
Residual plots, 104—114, 233—234, 
384—390 
Residual plots against fitted values, 
776—778, 780—781 
Residuals, 689, 835, 1004 
analysis of variance, 775—776 
Residual sequence plot, 778—779 
Residual sum of squares, 25 
Response, 21 
Response function, 764 (See also 
Regression function) 
Response modeling approach, 1255 
Response surface, 216, 309, 310 
Response surface design, 666, 
1267—1268 
blocking central composite design, 
1275-1276 
central composite design, 1268—1276 
design criteria, 1279—1281 
model interpretation and visualization, 
1284-1286 
optimal, 1276-1283 
optimal conditions, 1286-1287 
rotatable central composite design, 
1271-1272 
Response variables, 3, 165—168, 555—563 
transformations, 789—793 
Restricted mixed factor effects 
model, 1049 
Restricted randomization, 656 
Retrospective studies, 667—668 
Ridge regression, 431-437 
Ridge trace, 434, 435, 437 
Robust product design, 1244—1255 
Robust regression, 437-449 
Robust test, 794. 
ROC (receiver operating characteristic) 
curve, 606 
Rotatable central composite design, 
1271-1273 
Rotatable inscribed central composite 
design, 1273 
Roundoff errors, in normal equations 
calculations, 271—272 
Row sum of squares, 1189 
Row vector, 178 
Running medians method, 137—138 
Rutgers Experimental Station, 
1277-1281 


1394 Index 


Saddle point, 1285 
Sample size, 652-653 
Sample size planning for analysis of 
variance: 
estimation approach, 759—761, 
863—864, 1182-1183 
to find "best" treatment, 
721—722, 864 
F test, 1021 
latin square design, 1193—1194 
multifactor studies, 1021—1022 
power approach, 716—723, 862-863 
random block design, 909-912, 939 
tables, 1342-1344 
Sampling distribution, 44—46, 48—50, 
52-54, 69—70 
SAS PROC GLM, 981 
SAS PROC MIXED, 1075 
SAS PROC OPTEX, 1283 
Satterthwaite approximate F test, 
1068—1069 
Satterthwaite procedure, 1043—1045 
Saturated model, 588 
SBC, (Schwarz' Bayesian criterion), 
359—360 
Scalar, 182 
Scalar matrix, 186, 187 
Scaled residuals, 440—441 
Scaling, 272 
Scatter diagram/plot, 4, 19—21, 
104—105 
Scatter plot matrix, 232-233 
Scheffé, Henry, 793 
Scheffé joint estimation procedure, 
930-931, 934 
latin square design, 1190 
nested design, 1101, 1102 
prediction of new observation, 
160-161, 231 
randomized block design, 904 
repeated measures design, 1157 
single-factor analysis of variance, 761 
three-factor analysis of variance, 
1015, 1017 
two-factor analysis of variance, 
852, 857 
Scheffé multiple comparison procedure, 


single-factor analysis of variance, 
753—155 

Schwarz’ Bayesian criterion (SBC,), 
359-360 

Scientific studies, statistical design 
of, 642 

Scope of model, 8 

Screening designs, 1239-1240 

Second-order interaction, 996 

Second-order regression model, 295—296, 
297, 884 


Selection and validation of models, 
343-375 
automatic mode] selection, 582—585 
automatic search procedures, 361—369 
backward elimination, 368 
criteria for model selection, 353—361 
forward selection, 368 
forward stepwise regression, 364—367 
Semistudentized residuals, 103, 392, 
591, 776 
Sen, P. K., 798 
SENIC data set, 1348-1349 
Sensitivity, 606 
Sequence plot, 101, 108—109 
Sequential experimental runs, 1290-1292 
Serial correlation, 481 
Servo-Data, Inc., 790-791 
Shapiro-Wilk test, 116 
Sheffield Foods Company, 1070-1076 
Sigmoidal response functions, 538, 
559—563 
Signal-noise ratio, 1255 
Simple linear regression coefficients, 
11-12 
interval estimation, 45-47, 49—50, 
52—55, 54—55 
least squares estimation, 15-19, 
199-201 
point estimation, 21—22, 155—157 
tests for, 47-48, 50-51, 69—71 
variance-covariance matrix of, 
207—208 
Simple linear regression mean response: 
interval estimation, 52-55, 157—159, 
208-209 
joint estimation, 157-159 
point estimation, 21-22 
Simple linear regression models, 9-12 
ANOVA table, 67-68 
diagnostics for predictor variables, 
100-102 
error term distribution unspecified, 
9-12 
general test approach, 72-73 
interval estimation, 52—55 
joint estimation procedures, 
154-161 
in matrix terms, 197—199 
normal error terms, 26—33 
through origin, 161—165 
prediction of new observation, 55-61 
regression coefficients, 11-12 
residual analysis, 102-115 
tests for coefficients, 47-48 (See also 
Regression models) 
Simulated envelope, 596—598 
Simultaneous estimation, 747 
Simultaneous testing, 747—748 
Single-blind study, 658 
Single comparison procedure, 904 
Single degree of freedom test, 744, 964 


Single-factor ANOVA models, 68 1-685 
estimation of effects, 762—766 
factor effects model, 701—704 

with unweighted mean, 705-708 
with weighted mean, 709—710 
fitting of model, 685-689 
F tests, 698—701, 704 
least squares estimation, 687-689 
maximum likelihood estimation, 
687-689 
model I vs. model I, 685 
partitioning of SSTO, 690-693 
residual analysis, 775—781 
Single-factor study, 648 
analysis of covariance, 920-933 
estimation of effects, 737—761, 
930-932 

expected mean squares, 694-698 — 

experimental vs. observational studies 
677-679 

F tests, 716-718, 744, 795-798, 
928-929 

model II, 1030-1034, 1047 

planning of sample sizes, 716-718, 
718-720 

regression approach, 704—712 

repeated measures design, 1129-1139 

subsampling, 1106—1113 

Single-hidden-layer, feedforward neural 
networks, 537 

Single-layer perceptrons, 537 

Singular matrix, 190 

Smoothing methods, 137—141 

Sparsity of effects principle, 1224 

Spearman rank correlation coefficient, 

87-89 А 

Specificity, 606, 607 

Spector, P., 360 

Split-plot designs, 664, 1162—1163 

SPSS ANOVA, 981 

SPSS*, 763 

Square matrix, 178 

SSE (see Error sum of squares) 

SSTO (see Total sum of squares) 

SSTR (see Treatment sum of squares) 

Standard deviation, studentized 

statistic, 44 
Standardized multiple regression 
model, 273 
Standardized random variable, 1301 
Standardized regression coefficients, 
275-278 
Standard latin squares, 1186 
Standard normal distribution, table of 
cumulative probabilities, 1316 
Standard normal random variable, 
1302-1303 

Standard order, 1211 

Star points, 1269 

Statement confidence coefficient, 154 

Statistical computing packages, 980-981 


Statistical design of scientific studies, 642 
Statistical estimation, 1305—1306 
Statistical relation, 2, 3—5 
Steichen Bakeries, 1241—1244 
Stem-and-leaf models, 101—102, 108, 110 
Stem-and-leaf plots, 779 
Stepwise Regression Methods, 364—368 
Stepwise regression selection procedures, 
364-368, 583-584 
Structural empty cell, 967 
Studentized deleted residuals, 396-398, 
776—777 
Studentized Pearson residuals, 592 
Studentized range, 746—747 
Studentized range distribution, tables of 
percentiles, 1333—1335 
Studentized residuals, 394, 776 
Studentized statistic, 44, 58 
Subjects, 1127 
blocking, 1153 
Subsampling: 
randomized block design, 1367, 
1369—1370 
single-factor study, 1106-1113 
in three stages, 1113—1114 
Sufficient estimator, 1305 
Summation operator, 1297 
Sums of squares, 25, 225 
for blocks, 898 
in matrix notation, 204—205 
nested design, 1094—1095 
partitioning, 63-66 
pooling, in two-factor analysis of 
variance, 861—862 
quadratic forms, 205-206 
rule for finding, 1359-1361 
for subjects, 1130 
Supplemental variables, 344, 347, 919 
Suppressor variable, 286 
SYGRAPH, 101, 102, 104 
Symmetric matrix, 185 
Symmetry (of probit response 
function), 560 
Synergistic interaction type, 308 
SYSTAT, 19-20, 981 


Taguchi, G., 1244 
Taylor series expansion, 518 
t distribution, 1304 
Bonferroni procedure, 159, 160 
table of percentiles, 1317-1318 
Testing, 738 
Tests: 
for Constancy of error variance, 
116-119, 780-785 
for constancy of variance, 115 
factor level means, 704 


family of, 154—155 
goodness of fit, 586-590 
lack of fit test, 119-127 
for normality, 115 
for outliers, 115 
for randomness, 114 (See also F test; 
t test) 
Third-order regression model, 296 
Three-dimensional plots, 1284—1286 
Three-dimensional scatter plots, 233 
Three-factor interactions, 996 
interpretation of, 098—999 
test for, 1011-1012 
Three-factor study: 
ANOVA model, 992-998 
ANOVA partitioning, 1008-1009 
estimation of effects, 1013—1017. 
1069-1070 
evaluation of appropriateness, 
1005-1007 
expected mean squares, 1009 
fitting of model, 1003—1005 
F tests, 1009-1010, 1067—1068 
model II, 1066 
model III, 1066-1067 
nested design, 1114—1119 
regression approach, 1019-1020 
residual analysis, 1006, 1007 
unequal sample sizes, 1019-1021, 
1070-1077 
Tidwell, Р. W, 236 
Time series data, 319—321, 481 (See also 
Autocorrelation) 
Total deviation, 65 
Total mean squared error, 357—359 
Total sum of squares, 63—66, 690—693 
partitioning, 836-838, 1008—1009 
Total uncorrected sum of squares, 67 
Total variance, 1033 
Training sample, 372 
Training the network, 542 
Transformable interactions, 826—827 
Transformations of variables, 85—87, 
129—137, 220, 236, 490—492, 562, 
789—793 
Transpose of matrix, 178—179 
Treatment, 13, 649-652 
Treatment combination, 649 
Treatment effects (analysis of covariance), 
922-923, 928-929, 940, 
1180-1182 
Treatment means, 764, 817, 853, 
1018-1019 
differences of, 1002 
of equal importance, 1091 
estimation, 884—886 
multiple comparisons, 856-861 
of unequal importance, 970—980 
Treatment means plot, 820 
Treatment mean square, 694 
Treatment pattern sum of squares, 1199 


Index 1395 


Treatment sum of squares, 691, 837-838 
Trial, 4 
t test, 287—288 
equivalence of F test, 71 
power of, 50-51 
power value charts, 1327—1328 
Tukey joint estimation procedure: 
balanced incomplete block 
design, 1182 
latin square design, 1190, 1191 
nested design, 1101, 1102 
randomized block design, 904 
repeated measures design, 
1148, 1157 
three-factor analysis of variance, 
1015, 1017 
two-factor analysis of variance, 
850-851, 856 
Tukey-Kramer procedure, 75 1 
Tukey multiple comparison procedure, 
single-factor analysis of variance, 
746-153 
Tukey one degree of freedom test, 887 
Tukey test for additivity: 
latin square design, 1191—1192 
randomized block design, 903-904. 
two-factor analysis of variance, 
886-888 
Tuning constants, 440 
Two-factor interactions, 823, 995, 1012 
interpretation of multiple, 
999-1000, 1016 
interpretation of single, 1016-1017 
Two-factor studies: 
analysis of covariance, 934—935 
ANOVA model for, 829-833 
ANOVA partitioning, 836-840 
crossed, 1366-1369 
empty Cells, 964—967 
estimation of effects, 848—861, 
959-964, 970-980, 
1055—1060 
example, 812-813 
expected mean squares, 840, 
1052-1053 
fitting of model, 834-836 
F tests, 843-847, 1053—1054 
general linear test approach, 953, 
972-974 
mean squares, 839—840 
model II, 1047—1049 
model III, 1049—1052 
nested design, 1091—1092 
no-interaction model, 880-886 
partitioning, 836-840 
planning sample sizes for, 862-864 
estimation approach, 863—864. 
finding the “best” treatment, 864 
power approach, 862—863 
pooling sums of squares, 861—862 
regression approach, 953-959 


1396 Index 


Two-factor studies (cont.) 


repeated measures design, 1153—1161 
residual analysis, 842-843 

strategy for analysis, 847-858 

Tukey test for additivity, 886-888 
unequal sample sizes, 951—964 


Two-level factorial design, 665-666, 


1210-1212 
center point replications, 1222-1223, 
1243 
estimation of effects, 1212-1214 
F test, 1214—1215 
incomplete block designs, 
1240—1244 
normal probability plot, 1222 
Pareto plot, 1219—1220 
pooling of interactions, 1218-1219 
unreplicated, 1216-1223 


Two-level fractional factorial design, 


1223-1224 

confounding, 1224—1227 
defining relation, 1227-1228 
half-fraction, 1229 
projection property, 1232 
quarter-fraction, 1229-1231 
resolution, 1231-1232 
setting a fraction of highest resolution, 

1232-1239 
smaller-fraction design, 1229—1231 


Two-sided test, 47, 51 
Two-variable conditioning plots, 


451-452 


U 


testing of effects, 953—959, 
1019—1020 
three-factor studies, 1070-1077 
Unequal treatment importance, 
970-980 
Uniform precision central composite 
design, 1273, 1275 
Unimportant interactions, 824—826 
University admissions data set, 1351 
Unmeasurable mean, 1226 
Unrestricted mixed factor effects model, 
1049, 1050 
Unweighted mean, 702 
factor effects model with, 705—708 


V 


Validation of regression model, 350, 
369—375 
Validation set, 372 
Variables: 
relations between, 2—5 
transformations, 129—137 
Variable metric method, 543 
Variance, 52—54 
of error terms, 9, 24—26, 27—28, 
43-44 
of prediction error, 57-59 
of random variable, 1299—1300 
of residuals, 102 
tests for constancy of error 
variance, 115—119, 234, 
780-785 
Variance analysis (see Analysis of 
variance) 


Vector, 178 
with all elements 0, 187 
with all elements unity, 187 
random, 193-196 


Ww 


Wald test, 578 
Watts, D. G., 529 
Website developer data set, 1352 
Weighted least squares method, 128, 
421-431 
ANOVA models, 786-789 
Weighted mean, 703 
factor effects model with, 
709—710 
Weight function, 439-441 
Whole plots, 1162, 1163 
Within-class matching, 669 
Within-subjects sum of squares, 1131 
Working-Hotelling joint estimation of 
mean responses: 
confidence band, 61-62 
multiple regression, 230 
simple linear regression, 158—159 


x 


x? distribution, 1303 

table of percentiles, 1319 
x levels, 170-171 
x values, random, 78-89 


Unbalanced nested design, 1104-1106 

Unbiased condition, 43-44 

Unbiased estimator, 1305 

Uncontrolled variables, 919 (See also 
Supplemental variables) 

Unequal error variances, transformations, 
132-134 

Unequal sample sizes in analysis of 
variance, 951-964 

estimation of effects, 959-964, 
970-980, 1020-1021 


Variance components, 1055—1056 
Variance-covariance matrix: Y 
of random vector, 194-196 
of regression coefficients, 207—208, 
227-228 
of residuals, 203—204 
Variance function, 1271 
Variance inflation factor, 406—410, Z 
a 434-435 
Variance operator, 1299 
V criterion, 1280-1281 


Y values, random, 78-89 


z' transformation, 85 
Zero vector, 187 


